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This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical psychology, although a de- 
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CORPORATE DECISION MAKING: 
AN EMPIRICAL STUDY? 


ROSS STAGNER 2 


Wayne State University 


A questionnaire regarding corporate decision-making (dm) practices was mailed 
to 500 vice-presidents of 125 Jarge firms. Response rate was 50%. Data indicate 
that many goals other than profit maximization are important in decisions, and 
that estimates of marginal costs and profits are not always carefully made. 
Profitability and executive satisfaction with decision-making practices are posi- 
tively correlated. Factor analysis reveals at least three important dimensions 
of dm process: managerial cohesiveness, formal procedures in dm, and centrali- 
zation-decentralization. Factor scores derived from these factors were signifi- 
cantly different for firms in top and bottom thirds on profitability. However, 
these scores did not predict increases or decreases in profitability over a 7-yr. 
time span. Interpretation favors the view of the corporation as a coalition, with 
social role and personal bias of the executive affecting his decisions. Participative 


practices are supported as both satisfying and profitable. 


Most of the literature on decision making 
in large corporations is of a highly abstract, 
theoretical, normative type. It sets forth, with 
impressive mathematical treatment, the deci- 
sion processes in which corporation executives 
should engage if a number of quite unrealistic 
assumptions can be met. The articles have 
been characterized, perhaps unkindly, as our 
modern version of “how many angels can 
dance on the head of a pin?” 

A second category, including most of the 
remaining publications on corporate decision 
making, includes memoirs of successful execu- 
tives. Like case studies in the field of clinical 
psychology, these someitmes offer intriguing 
hypotheses about the process under investiga- 
tion, but no acceptable data to support the 
proposed theory. 


1 This research was financed by a grant from the 
‘Ford Foundation for the academic year 1963-1964. 
The Foundation is in no way responsible for the 
contents of this report. 

2Requests for reprints should be sent to the 
author, Department of Psychology, Wayne State 
University, Detroit, Michigan 48202. 
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The field has been dominated by classical 
economic theory. The major assumptions 
about corporate decision making (dm) are as 
follows: (a) the firm is a unit; (0) the firm 
acts to maximize profit; (c) the firm is com- 
pletely informed about alternative courses of 
action, consequences of each alternative, and 
the probabilities of these consequences 
(Simon, 1959). Some years of study of indus- 
trial conflict (Stagner, 1956) were convincing 
that these were contrary to fact. In an indus- 
trial dispute, top executives usually produce 
recommendations for corporate action, many 
of which are mutually contradictory, and the 
settlement of the dispute inside management 
often is nearly as difficult as that with the 
union. Second, the firm often acts on power 
considerations, or even on the basis of main- 
taining a public image, rather than on profit 
considerations. Third, members of the firm 
are often woefully ignorant about alterna- 
tive courses of action and their probable 
consequences. 

In recent years there has been some increase 
in empirical research on corporate dm, most 
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of which has had the effect of further shaking 
confidence in the relevance of the classical 
assumptions. Simon (1959, 1960) has cited 
numerous instances in which the unity of the 
firm was a fiction. White (1961) documents 
many conflicts at the executive level, espe- 
cially between functional departments. Dalton 
(1959) provides intriguing instances of the 
conflict between outlying branches and a cen- 
tral corporate office. Stagner (1965) offers 
various examples of disputes between vice- 
presidents in large- and medium-sized corpora- 
tions. March (1965) shows that a firm has 
some characteristics of a political coalition, 
composed of conflicting subunits. 

With respect to profit maximization as a 
goal, the empirical data also lead to rejection 
of the assumption. Soelberg (1967) has 
stressed the importance of individual goals 
which may have nothing to do with profit 
maximization. Simon (1960) points out that 
most executives accept a “satisficing” policy 
rather than an optimizing alternative. Stagner 
(1965) has shown that suboptimization may 
be quite common, as when a corporate policy 
is a compromise between what is optimal for 
a subunit and what is optimal for the entire 
firm. Feldman and Kantner (1965) point out 
that the alleged rule of profit maximization, 
if defined precisely, often fails to predict 
the decision made by a firm; and Mueller, 
Wilken, and Wood (1961) document this 
logical point with case studies in which an 
owner disregarded cost estimates in making 
plant location decisions. 

Simon (1960) has been particularly inter- 
ested in the assumption of perfect knowledge 
and perfect rationality in dm. He notes that 
information costs money, and most manage- 
ments stop searching for alternative courses 
of action when they locate a “satisficing” 
option. Relevant case studies are those of 
Cyert, Simon, and Trow (1956), and Cyert, 
Dill, and March (1958). Marschak and 
Radner (1954) point out the difficulty of 
perfect communication from one member of 
the firm to another, and hence the unavailabil- 
ity of all the information in the dm process. 

Psychologists have been concerned with the 
importance of perceptual bias in the handling 
of information. Cyert, Simon, and Trow 
(1956) gave identical case histories of a firm 


to 23 executives in a training program. The 
selective perception of information is indicated 
by their answers to the question: ‘What is the 
most important problem facing the new presi- 
dent of this firm?” Of the sales executives, 
83% named a sales problem, while only 29% 
of nonsales officials mentioned sales. Stagner 
(1965) reports instances in which production 
and sales managers sponsored diametrically 
different solutions to what was ostensibly the 
same problem. Zalkind and Costello (1962) 
have offered interpretations of the literature 
on perception as an aid in understanding dif- 
ferences in managers’ choice of information to 
guide a decision. Appropriate to their remarks 
is the observation by Bowman (1961) that the 
“operations analyst” may fail to perceive some 
important item which is obvious to a working 
manager. Bowman advocates use of varied in- 
formation sources to minimize such oversights. 

Much of the “information” used in dm is 
biased by executive wishes and expectations. 
Cyert, Dill, and March (1958) suggest that 
staff personnel first decide whether the idea 
is good, then marshal data to support their 
view. Their paper gives a detailed account of 
the data-gathering process and cost estimates 
for installation of an electronic data process- 
ing unit in a medium-sized corporation. It 
sounds very impressive until they quote a 
staff member as saying, “In the final analysis, 
if anybody brings up an item of cost we 
haven’t thought of, we can balance it by 
making another source of savings tangible [p. 
340]. Similarly, Stagner (1965) quotes a 
corporate vice-president on the question of 
cost figures: ‘“The salesmen handling this line 
wanted to have unit cost data. I opposed 
giving it to them, partly because they might 
unintentionally reveal it to a competitor, but 
more because these cost figures are in some 
respects artificial [p. 17].” Thus the fancy 
mathematical solutions developed by the op- 
erations research staff apparently incorporate 
highly subjective estimates of various cost 
factors. 

Not only is the information biased; in many 
instances it is ignored. Consider the instance 
in which careful market research led to “a 
proposal to set up three installations, each 
costing $5 million, two in the United States 
and one in western Europe. The head of the 
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English subsidiary and the head of the French 
subsidiary got into a feud over which would 
get the European unit. After considerable ne- 
gotiating, the American controlling executives 
decided to put one each in France and Eng- 
land |Stagner, 1965, p. 17].” In this instance 
detailed staff work had indicated that one 
installation was adequate to the foreseeable 
European market. The key issue was the rel- 
ative status and power of the two European 
executives, and $5 million was the “side pay- 
ment” to keep peace in the organization. This 
is an instance of “suboptimization,” a com- 
promise between optimum for the subunit and 
optimum for the entire firm. It suggests that 
profit is only one of many goals which de- 
termine corporate decisions. 

Economic theorists have not ignored abso- 
lutely the conflict of their assumptions with 
empirical reality. In an intriguing effort to 
incorporate some of these observations, Har- 
sanyi (1962) suggests that optimizing equa- 
tions be rewritten to include such factors as 
the opportunity cost to A of getting and using 
power, the cost to B of refusing to yield to A, 
and the personal affection of B for A. Ob- 
viously such a formulation deviates rather far 
from the simple profit-maximization approach. 

These observations suggest that there is an 
urgent need for research on corporate dm 
processes which is theoretically based but 
reasonably close to empirical reality. This 
study offers a beginning on that task. 


MeEtHop 
A Theoretical Position 


It seems appropriate to approach the problem of 
sorporate dm by making the following assumptions: 
(a) corporate policy is established by persons occu- 
dying certain role positions in the organization; (b) 
the behavior of these persons is determined in part 
xy role prescriptions, and in part by personal mo- 
tives; (c) perceptions of corporate resources, alter- 
lative actions, utilities, and probabilities of out- 
somes will be affected by role-induced experiences 
ind by personal experiences; (d) policy proposals 
»y different executives will reflect these divergent 
yerceptions and motivations; and (e) corporate de- 
‘isions will represent compromises among these, af- 
ected by the power of proponents as well as by 
ogic and realistic data. 

Research on this kind of complex phenomenon 
deally should involve analysis of all relevant docu- 
nents, video recordings of all relevant conversations 
nd conferences, depth explorations of each execu- 
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modifies 
perceived 
probability. 


Some valences 
concealed or 
unconscious, 





As seen by c 


Executive M unobserved alternative 


Fic. 1. A corporate decision situation as 
perceived by one executive. 


tive to ascertain conscious and unconscious desires, 
symbols, etc., affecting his policy preferences, and 
an assessment of the power fields of the various 
executives. For many reasons such research is not 
presently feasible. As a substitute we may rely on 
reports of dm from participant executives, and try 
to ferret out the process from these subjective data. 
The task is difficult. Phenomenological reports are 
slippery even when made by trained observers having 
no aspirations likely to bias their observations. In- 
dustrial executives are, for this purpose, untrained 
observers, and it is assumed that they will introduce 
biases into their reports. Nevertheless, this seems the 
only suitable source of data at this time for a study 
of high-level dm processes. 

Our approach proposes that an individual execu- 
tive view a problem situation as sketched in Fig. 1. 
He is fully aware of a major corporate goal (which 
need not be profit; it may be competitive standing, 
public “image,” political advantage, etc.) and he is 
aware that attainment of this goal is impeded by 
some difficulty; there is a discrepancy between the 
existing state and the preferred state of affairs. He 
is also affected by, but not necessarily conscious of, 
various other positive and negative goals. Some of 
these relate to his department, division, or staff posi- 
tion in the corporation; others to his personal power, 
prestige, or profit. He sees possible courses of ac- 
tion, positive and negative utilities, and probabili- 
ties of various outcomes. 

Another executive perceives the situation in a 
slightly different way (Fig. 2); his alternatives, utili- 
ties, and probabilities reflect his role position and 
his personal needs. Discussion of the issue by the 
two persons results in overlapping phenomenal fields 
(Fig. 3), but the theory assumes that these never 
coincide perfectly. 
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Fic. 2. The decision situation of Fig. 1 as perceived 
by another executive. 
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Fic. 3. The joint field of the two executives. 


Adjustment of the remaining differences may be 
brought about by (a) cognitive modification (data 
which induce changes in one phenomenal field) or 
(6) dynamic modification, involving “side pay- 
ments” or rewards and punishments which bring 
about modification of valences. The solution to the 
problem may result from the introduction of a 
power field (e.g., the chief executive, Fig. 4) which 
removes all alternatives but one. It may, on the 
other hand, be a compromise between the solutions 
preferred by the subordinates, as in Figure 5. In ex- 
treme cases the solution involves replacement of one 
or more executives. 

The dm in such a theoretical context will be af- 
fected by various objective environmental constraints 
(the “foreign hull” of Lewin’s lifespace), by subjec- 
tive constraints (the executive’s refusal to consider 
some alternatives), by the internal pattern of com- 
munications and authority relations, by the traditions 
of the firm, etc. Each of these can be inferred at 
some unspecified level of precision from reports given 
by executives of their participation in the dm 
process. 


Hypotheses 


While this was designed as an exploratory study, 
the following hypotheses were set up for testing: 


Chief 
Executive 





Fic. 4. Decision by blocking one alternative. 


1. Profit maximization will be the only goal re- 
ported by executives. 
{ 2. All corporations make decisions in the same 
jmanner, ie., there are no significant differences in 
‘style of dm. 

3. An executive’s power status within the firm 
does not affect his part in the dm process. 

4. Profitability is unrelated to variations of man- 
ner of making decisions. 

5. Personality variables are nonsignificant in the 
corporate dm process. 


Procedure 


This study develops from an earlier project (Stag- 
ner, 1965) in which unstructured interviews explored 
executive perceptions of the corporate dm process. 
Analysis of the interview material led to the formu- 
lation of 28 questions which seemed to merit quan- 
titative analysis. These were classified into six 
groups, although these categories are useful primarily 
for noting relations to theory, not for statistical 
analysis. The groups, and illustrative items, were: 
goal variables: cost and profit estimates, company 
tradition, corporate image. Means variables: chan- 
nels of communication, lines of authority, speed of 
decisions, formal routines, use of ad hoc committees, 
use of outside consultants. Leadership variables: 
chief executive talks to one vice-president at a time; 
chief executive is concerned that all be satisfied. 
Role variables: conflict between central office and 
divisions; vice-presidents exaggerate importance of 
their division. Interaction variables: groups among 


top executives; tension among top executives; dis- 


cussion among all persons affected; loser (on a de- 
cision) feels defeated. Outcome variables: satistac- 
tion with decision-making procedure; morale of top 
executives. (In addition, profits as percentage of 
capital and profits as percentage of sales were used 
as outcome measures but these were not in the 
questionnaire.) 

The questions were formulated so that they could 
be answered by making check marks on a /7-step 
scale, to avoid the objection to forced yes-no an- 
swers. The wording of the items and the definitions 
of the ends of each 7-step scale, with the answers 
from 217 vice-presidents, are given in Table 1. 


Partial 
deprivation 






Partial 
deprivation 


Fic. 5. Decision by compromise (partial deprivation 
for each executive). 
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TABLE 1 
QUESTIONNAIRE ITEMS AND RESPONSES 






















































































‘Very 
Item Response important” 

1. Relative speed with which EAN Mus ie tt bee Se AAO. oy 09 8) 80a) 200g 87 
top-level decisions are somewhat slower than considerably faster 
made similar companies than similar com- 

panies 

2. Concern over formal steps rd Te 22 sy Otis St 487 3 843 24 1S_ 
in decision-making at top much attention to formal little attention to 
level (regular meetings, routines form 
written records, etc.) 

3. Estimates of cost and OLE ee Momo lee LO} kl gee, 86 
anticipated profit to re- always carefully COMMA Mt lho a rough estimates only Pi 
sult from a decision puted 

4, Discussion among all top AUR OCU EOE | LAER ODiate lO 6 Loi 9s 59 
executives discussions include all most. include only pe 

executives affected two men at a time 

5. Use of a top-level policy OAD tae teal” 1 18 88 oe A 36 
committee or operating we have it merely rl it is an active deci- 
committee none approves sion-making appa- 

decisions ratus 
already 
made 

6. Use of ad hoc or special Wa le eee OT tt dy OR 12 Ft Be 
committees for single common practice usually one person, 
projects not a committee 

7. Tendency of each vice- eee re eee fA ig parte oli §, SBF O85 4 O35 
president to exaggerate serious problem here < not a serious problem 
importance of his own 
area 

8. Social interaction of top foto in Ossi Bai er se) 205 10 20 os mat 
executives outside office frequent Caeser rare 
hours (for nonbusiness 
purposes) 

9. Concern of chief executive va RO/: ee WRG et O MMT CZ.amee O LO mee Wd fame emi: ile 40 
for detailed information wants substantial detail = = —~—~—_ prefers broad outlines 
on which to base decision 

10. Concern of chief executive LOY Rae a ae a eS a 18 
that all executives are considerable concern ; minor concern 
satisfied with the decision 
11. Concern with “going By LAD en Si BO et aa8. 82.3 18h 24 
through channels” relatively little attention a Be communications always 
to this observe channels 
2. Importance attached to po tD it 96) | 434 $86.65 35.5 330i AS cud 4 
company tradition and not much; easy to considerable weight 
past policies break tradition attached to tradition 
!{3. Concern of chief executive tt Fs Jeet Opin dD). fo G1), 22101 setbOels pl7.. 
for “unanimous agree- very reluctant to approves policy when 
ment” confirm policy if he sees a clear 
anyone opposes majority 

4. Reaction of executives to ae en eS ee et Os Oe ae 14_ 
decisions which go often feel ‘defeated’ usually accept decision 
against their preference in such a case without feeling 

“defeat” 
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Item 





15. Preferred style of chief 
executive 


16, Normal operation of 
divisions within company 


17. Tendency of chief executive 
to give the “losing” 
executive some other con- 
cession to make him feel 
better 

18. Chief executive’s preference 


for division heads to be 
partisan of division or 
look at company as a 
whole 


19. Use of outside consultants 


20. Clear lines of authority 


21. Importance attached to 
company “image” as 
seen by public 

22. Importance of person- 


alities in decisions at 
this level 


23. Importance of divisional 
vs. central office 
disagreements 

24. Ability of strong divisions 
to get their own way 


25. Groups within top 
executive echelon 


26. Amount of tension within 
top executive group over 
a difficult decision 

27. Your estimate of morale 
of top executive echelon 


28. Your feeling of satisfaction 


with the way these de- 
cisions are handled 
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TABLE 1—(Continued) 


Response 
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talk with one man 
at a time 


AS 3 S309 "4 


“Very 


important” 


talk with all interested 
men together 


most divisions highly 
independent 


3 ee NOS Se oe ae 





often happens 


SS 810, Cees cee 








prefers strong 
division 
partisanship 


5 hoe WOO eS © 








used on most 
important issues 


little divisional 
autonomy 


wants all to think 
chiefly of company, 
not division 


rarely used by this 
company 


everyone knows and 
respects lines of 
authority 


10 Asie 30) eam aot 


lines of authority 
ambiguous, often 
ignored 


often outweighs 
cost factors 


GO Tis) ce 39. seed 








vigorous, persuasive 
individual often wins 


point 

Ds SOL M5 Ag Aa Ra 
few such 

disagreements 


T°F5, FAO) Ae es eS 


would have little 
effect 


importance of function 
to company usually 
decisive 


these are 
fairly common 


strong divisions win 
if deeply concerned 


2» send lid eae se. 








some men habitually 
vote together as a 
group 


no differences 
among divisions 


no tendency toward 
alignments 


no tension or 
personal feelings 


tension sometimes high, 


personal frictions 


48° 5 70 2. Bla as ee 








very well satisfied 


: 105 AD nae 
high 


CaS Limi) od 


some satisfaction 
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Sample of Firms 


The questionnaire as described was sent to 500 
persons at the vice-presidential level (VPs). Sam- 
pling was as follows: from the Fortune Magazine 
list of 500 largest American corporations (July, 
1963), 125 were selected by taking every fourth 
name.? From Standard and Poor’s Registry of Di- 
rectors were obtained the names of four VPs (in- 
cluding when necessary treasurers, controllers, etc., if 
not enough VPs were listed). Thus, 500 question- 
naires were mailed. These were coded to identify 
firms but not individual respondents. 

Returns were received from about 260 individ- 
uals. All four officials replied from 6 firms; three 
from 29 firms; two from 46 firms; and only one 
from each of 28 firms; but a number of these were 
incomplete. The final analysis is based on 217 re- 
sponses from 109 firms. 


RESULTS 


The distribution of 217 executives from 109 
firms, in terms of their answers to specific 
items, is given in Table 1. Some interest will 
attach to these in the light of points made 
above. For example, it is obvious that in many 
firms, cost and marginal profit estimates are 
not carefully made (Item 3). In fact, a sub- 
stantial number (28%) indicated that “rough 
estimates” were made of such variables. Table 
1 also indicates that company “image” may 
outweigh profit considerations (Item 21) and 
adherence to tradition may also be an. im- 
portant value (Item 12). These data merely 
confirm observations already made regarding 
the importance of goals other than profit 
maximization. They lead to rejection of Hy- 
pothesis 1 (that profit will be the only goal 
reported). 

The first seven columns in Table 1 show 
the number of VPs checking at each of the 
seven steps on the answer scale. Column 8 
gives the number checking that item as very 
important. Various computations involving 
this datum led to absolutely no meaningful 
results, and the conclusion follows that these 
respondents were not good judges of the rela- 
tive importance of various items in relation 
to dm procedure or outcome. 


2A few substitutions were made, eg., when the 
above process turned up atypical organizations such 
as a large farmers’ cooperative. In such instances the 
replacement was the next firm on the list. 


TABLE 2 


CORRELATIONS BETWEEN Pairs OF EXECUTIVES 
DESCRIBING THE DECISION-MAKING PROCESS 





Group | Mean 7 | Mean r 
A (intra-firm) .463* 433% 
B (random multiple-response) 330 319 
C (random single-response) 


310 300 


Note.—t (A-B), 2.78, p <.05; t (A-C), 4.95, p < .05; 
t (B-C), 0.59, ns. 


Reliability of Data 


To talk about the dm process in a firm, 
it must be shown that it can be described 
with sufficient precision as to be different from 
some other firm. To test this, the authors 
took 52 pairs of men from the multiple- 
responding firms and computed profile correla- 
tions (the pattern of responses on one ques- 
tionnaire against that on another). Only one 
pair was taken from any given firm (Group A, 
Table 2). Then each of these 52 was paired 
against a man from a different firm. These are 
shown as Group B. As another control we 
took 190 random pairs from firms sending in 
only a single response (Group C). For pairs 
within the same firm the actual range of cor- 
relations was from +.79 (rather high agree- 
ment) to +.01 (no agreement at all). The 
mean (Table 2) is +.43. For Group B (same 
men, paired across firms) the range is +.73 to 
—.27, with a mean of +.319. In Group C 
(random pairs from firms with one respondent) 
the range is +.86 to —.53; mean, +.30. The 
A-B and A-C differences meet the .05 level 
for a two-tailed test. Thus we can say con- 
servatively that two VPs from the same firm 
will agree more in their responses than two 
from different firms, and the hypothesis of an 
observable event (the dm process) is sus- 
tained. (Conversely, it is a bit discouraging 
that the agreement within the firm is not 
higher; it suggests that the unity of the firm 
is even less than suggested earlier. We thus 
reject Hypothesis 2 (that the dm process is 
the same in all firms.) 

An examination of these profile correlations 
suggests that in some firms there is high com- 
munication and mutual understanding, while 
in others there is fractionation and dissocia- 
tion. For one group of three VPs in the same 
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TABLE 3 


Items Recervinc HicHEst LOADINGS ON THE 
UnrotaTEeD Matrix, Factor I 

















Item Loading 

28. Your feeling of satisfaction with way 

decisions are handled (very well) fli 
27. Your estimate of top executive morale 

(high) 73 
20. Clear lines of authority (yes) 65 
4. Discussion among all top executives 

(all incl.) 59 
26. Tension within top executive group 

(rare) Ed, 
25. Groups within top executive echelon 

(rare) 54 
11. Concern with going through channels 

(yes) 54 





Note.—The answer given in parentheses defines the positive 
end of the 7-step scale. See Table 1 for the complete items. For 
N = 217, a correlation of .30 is significant at the .01 level. 


firm, the correlations are .66, .79, and .64; for 
another set of three, the figures are .18, .13, 
and .38. This fact fits with other data to be 
mentioned later. 


Dimensions of the DM Process 


One major concern of the study was to 
identify styles of dm in corporations, and the 
tactic utilized for this purpose was factor 
analysis. Each of the 28 questionnaire items, 
plus size, profit on sales, and profit on capital, 
was correlated with all the others, and sub- 
jected to a principal axes analysis. Seven fac- 
tors emerged, of which four had variances 
above 1.0. Only the first two seemed to make 
sense in terms of everyday knowledge of cor- 
poration functioning. Factor I is heavily 
loaded on executive morale (Table 3). This 
seems plausible in view of the number of 


TABLE 4 


Items REcEIvING HicuEst LOADINGS ON THE 
UnroTaATED Matrix, Factor II 








Item Loading 
30. Profit as % of sales 47 
31. Profit as % of capital 42 
2. Formal routines (yes) 39 
6. Ad hoc committees (common) 34 
19. Outside consultants (often) 34 
1. Speed of decisions (slow) sales 
7. Losing executive feels defeated (yes) Ail 


questions asked which would bear, in one 
way or another, on the satisfaction of execu- 
tives with the dm process. Teachers of indus- 
trial psychology and management courses 
will be pleased to note that “going through 
channels” and ‘clear lines of authority” favor 
high morale. Profitability is positively but not 
heavily loaded on this factor. 

Factor II has its highest loadings on the 
two profitability indexes (Table 4). In addi- 
tion, it includes several “means” items: ‘“‘con- 
cern over formal steps,” “use of ad hoc com- 
mittees,”’ “outside consultants,” and relatively 
slow decisions. One possible interpretation, 
based partly on independent knowledge of the 
firms, is that this factor weights profitability 
based on strong decentralized divisions, as 
opposed to a cohesive central administration. 
Further support for this view derives from 
Table 5, showing items which reverse sign 
from Factor I to II. It will be noted that 
the loadings in Factor I point to an inte- 


TABLE 5 


Items WuicH REVERSE SIGN FROM 
Factor I to Factor II 








Answer positively loaded on ~ 
Item 


7 ay 





Each VP exaggerates 
importance of his 
division 

Executives feel ‘“‘de- 
feated”’ if losing de- 
cision 

Chief executive gives 
some other conces- 
sion to loser 

Lines of authority 
clear 

Corporate function 
more important 
than vigorous per- 
sonality 

Conflicts of divisions 
vs. central office 

Strong divisions get 
own way 

Groups within top 
echelon of execu- 
tives 

Tension among execu- 
tives over tough 
decisions 


Yes 


Yes 


Yes 


Maybe 


Not always 
Few Common 


Not usual | If deeply concerned 


No Some 


No Sometimes high 


a 
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TABLE 6 


Items Recrtvinc HicHEst LOADINGS ON THE 
Roratep Matrix, Factor I’ 


TABLE 7 


Items RECEIVING HiGHEsT LOADINGS ON THE 
Rotated Matrix, Factor II’ 





Item Loading 
27. Estimate of top executive morale (high) .67 
26. Tension at top (low) .63 
14. Losing executive feels defeated (no) 58 
28. Your satisfaction with procedure (high) Oil 
23. Conflicts between central office and di- 
visions (rare) 0 
7. Vice-presidents exaggerate area import- 
ance (not serious) 53 


25. Groups at top (no) 44 


grated structure, while II points to a number 
of autonomous units under a single corporate 
roof. Both factors point to rejection of Hy- 
pothesis 4 (profitability not related to man- 
ner of dm). 

Most factor analysts would argue that ro- 
tation of factors to simple structure provides 
the best approach to identifying meaningful 
dimensions in a mass of correlational data. 
The seven factors were rotated by Varimax, 
giving quite a different pattern from the un- 
rotated structure.* 

Factor I’ (Table 6) confirms the suggestion 
of a dimension of managerial cohesiveness. 
Firms high on this factor have managers 
skilled at working together; but the items 
give us no hint of how this coordination was 
achieved. 

Factor II' (Table 7) might be called ‘“for- 
mality in dm”; it has elements of participa- 
tive management style and also essential bu- 
reaucratic procedures. The style of the chief 
executive seems to be important here in keep- 
ing a “tight ship” but at the same time lis- 
tening to all concerned and maintaining high 
morale. It is a little surprising that, statisti- 
cally, this is independent of Factor I’. 

Factor III’ has precisely two items loading 
significantly—profit on sales and profit on 
capital. This seems to confirm the factor ana- 
lyst’s belief that his procedure can extract a 
logically independent factor even if, in the raw 


4The author wishes to acknowledge the generous 
assistance provided by the staff of the Wayne State 
‘University Computing Center in adapting programs 
for this purpose, and the assistance of D. R. Jacobs, 
in cross-checking many details in the data analysis. 





Item Loading 
4, Discussion among all executives affected 
(yes) 62 
10. Chief shows concern that all are satisfied 
(yes) 52 
15. Chief talks with one executive at a time 
(no) soe! 
11. Go through channels (yes) 52 
20. Clear lines of authority (Yes) 1 
28. Your satisfaction with procedure (high) BOIL 
2. Formal routines in decision-making (yes) 47 


data, it is thoroughly mixed in with other 
items. 

Factor IV' seems to be a “fragmentation” 
or decentralization dimension (Table 8). It is 
plausible that such a dimension would exist 
in a population of corporations, but puzzling 
that it is independent of I’. 

Factor V’ seems to represent a group of 
firms with highly personalized management, 
by which is meant that personalities may 
weigh more heavily than organization. How- 
ever this factor accounts for only 10% of the 
common variance, which fits with other re- 
ports that corporate structure and power, not 
personality as such, determine dm outcomes. 
The two remaining factors were discarded be- 
cause they had few significant loadings. 


DM and Corporate Outcomes 


Major goals of corporation executives, as 
postulated, include profits and competitive 


TABLE 8 


Items Recervinc HicHEst LOADINGS ON 
RotateD Martrrx, Factor IV’ 





Item Loading 

18. Chief executive prefers division heads to 

be partisan of division (Yes) 48 
16. Normal operation of divisions (highly in- 

dependent) 46 
24. Ability of strong divisions to get own 

way (win if deeply concerned) AS 
12. Importance attached to company tra- 

dition and past policies (not much) 36 
22. Importance of personalities in decisions 

(vigorous person often wins) poz 


10 Ross STAGNER 


TABLE 9 


PROFIT AS PERCENT ON SALES 





2. Concern over formal steps in decision 
making at top level (regular meetings, 
written records, etc.) (much attention to 
formal routines) 


3. Estimates of cost and anticipated profit to 
results from a decision (always carefully 
computed) 


4. Discussion among all top executives (yes) 


8. Social interaction of top executives outside 
office hours (for nonbusiness purposes) 
(frequent) 


11. Concern with “going through channels”’ (al- 
ways observe) 


20. Clear lines of authority (everyone knows and 
respects) 


21. Importance attached to company “image” 
as seen by public (often outweighs cost 
factors) 


28. Your feeling of satisfaction with the way 
these decisions are handled (very well 
satisfied) 








Note.—lItems differentiating at .01 level. 


stature. The variables of profit as percentage 
of sales, profit as percentage of capital, and 
size, represent indexes of such goal-achieve- 
ment. Does the type of dm activity within a 
firm have any relevance for such indexes of 
success? 

We may first look at some specific items 
from the questionnaire, and then at factor 
scores based on the dimensional analysis. An 
item analysis was carried out by taking the 
top third and bottom third of all firms on 
each of the three outcome variables, and run- 
ning t-tests on the difference in mean response 
to each item. 

Profit on sales. The eight items which dif- 
ferentiated at the .01 level between firms most 
and least profitable are shown in Table 9. 
Some of these would be expected (costs and 
marginal profits are carefully estimated, as 
Adam Smith would have urged). Bureau- 
cratic routines are also high. However, there 
is evidence of the importance of interaction 
variables (discussions, outside socializing) 
which suggests that formal structure by itself 
is not enough. Some observers will be amused 
by the fact that “company image may out- 
weigh cost factors” leads to more profitability, 


not to losses. And in this context one is not 
sure whether executive satisfaction helps earn 
profits, or whether firms that earn profits 
have satisfied VPs. 

Profit on capital. Six items meet the .01 
criterion for profit on capital. Only minor 
differences are reflected if we compare these 
(Table 10) with those in Table 9. Profit on 
capital may be a bit more closely related to 
bureaucratic routines. However, much of this 
apparent difference would disappear if we 
published the items differentiating at the .05 
level; in general, conditions favoring profit 
on sales are also those which favor profit as 
a percentage of capital. It should be noted 
that ranking for profit on sales correlates .71 
(in this sample) with ranking for profit on 
capital. 

Size. Only five items distinguish the top 
and bottom thirds of the size distribution 
(Table 11). Most of these are plausible in 
the sense that we would expect, in a larger 
firm, that cost estimates would be more care- 
fully made, that the chief would not ask for 
much detail, and so on. 

The data were also analyzed to see if size 
functioned as a moderator variable to affect 
profitability/dm relationships. The explora- 
tion did not confirm the tentative hypothesis 
that profitable management of a smaller enter- 
prise would follow a different pattern from 
that in a larger firm. In caution, it should be 
noted that this sample of firms was limited to 


TABLE 10 


PROFIT AS PERCENT ON CAPITAL 


5. Use of a top-level policy Committee or oper- 
ating committee (Yes) 


8. Social interaction of top executives outside 
office hours (for nonbusiness purposes) 
(frequent) 


9. Concern of chief executive for detailed in- 
formation on which to base decision 
(wants substantial detail) 


15. Preferred style of chief executive (talk with 
all interested men together) 


20. Clear lines of authority (yes) 


28. Your feeling of satisfaction with the way 
these decisions are handled (very well 
satisfied) 





Note.—Items differentiating at .01 level. 
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TABLE 11 


IteMs DIFFERENTIATING SMALLER FROM 
LARGER FIRMS 


TABLE 12 


Factor Scores RELATED TO PROFITABILITY 
ON SALES 1963 

















3. Estimates of cost and anticipated profit 
(carefully made) 


28. Your feeling of satisfaction (not very high) 
5. Use of top-level policy committee (yes, 
active) 
9. Concern of chief with detailed information 
(no) 
20. Clear lines of authority (yes) 
27. Estimated morale at top (high) 


Note.—Answer for larger firms; items significant at .05 level. 


rather large corporations—and highly profit- 
able ones, too—so that such differences may 
not have been identifiable within the sample 
studied. 

Factor scores and profits. Table 12 shows 
the factor scores on rotated Factors I’, II’, 
IV’, and V’ for the top and bottom thirds on 
profit on sales.° Similarly, Table 13 shows the 
mean scores for profit on capital. 

Decidedly surprising is the fact that all 
four factors are significant in Table 13 and 
two in Table 12. The two nonsignificant fac- 
tors in Table 12 show differences in the same 
direction as those in 13. It will be recalled 
that profitability had very small loadings on 
all four factors after rotation. In both Tables 
12 and 13, profitability is associated with 
greater cohesiveness, more formality (bureau- 
cratic routines), centralization, and tendency 
away from personalized management. It thus 
seems fair to conclude that outcomes are 
affected by style in dm even when some effort 
has been made to exclude the effects of profit- 
ability statistically. 

Changes in profitability. The last, and most 
severe, test of any measure of real-life vari- 
ables is its ability to predict outcomes at a 
later date. In the present instance, data were 
collected in the winter and spring of 1964. 
Since Fortune Magazine obligingly publishes 


5 The factor scores used in this analysis are ap- 
proximations. The heavily loaded items (shown in 
earlier tables) were scored for a given factor, but 
were not differentially weighted. The increase in pre- 
cision which would have resulted would have been 
minimal, and the absence of a cross-validation 
sample argued against any need for added precision, 








High | Low 
Factor profit | profit] p 
rank | rank 
I’ organizational cohesiveness | 27.52 | 31.14 | .025 
II’ formality in decision 
making 37.32 | 46.14 | .001 
IV’ decentralization 26.15 | 25.70 | ns 
V’ personalized management | 25.49 | 24.74] ns 





its list of the 500 largest corporations annu- 
ally, and provides rankings on profitability on 
sales and capital each year, we were tempted 
to see if our dm indexes predicted change in 
profitability. 

Profit ranks were taken for 1958, 1963, and 
1965. Change scores were computed for 1958— 
1963, 1958-1965, and 1963-1965. If dm in- 
dexes recorded relative managerial efficiency, 
perhaps they would correlate with these 
change scores. We therefore correlated the 
four factor scores with these six change scores, 
a total of 24 correlations. Not a single one 
reached the .05 level of significance. We must 
conclude that the hypothesis that dm pre- 
dicts increase or decrease in profitability has 
been disconfirmed. 


DISCUSSION 


Data confirm the author’s expectancies— 
and hence, perhaps, are suspect—in that they 
contradict the assumptions of classical theory. 
Individual executives and variant forms of 
corporate organization do have significant ef- 
fects on the dm process. Profit is a major goal 


TABLE 13 


Factor Scores RELATED TO PROFITABILITY ON 
Capitat 1963 








Factor profit | profit] # 
rank | rank 
I’ organizational cohesiveness | 27.36 | 30.62 | .025 
Il’ formality in decision 
making 38.84 | 44.47 | .01 
IV’ decentralization 26.58 | 25.22 | .01 
V’ personalized management 26.49 | 24.27 | .001 
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but by no means the only one, and most 
executives agree that in some instances they 
reject profit in favor of some other value. 

The data lend considerable support to the 
view of the firm as a coalition (March, 1962). 
Strong divisions within the company may 
get their way without regard to the welfare 
of the whole (Item 24), VPs exaggerate im- 
portance of their divisions (Item 7), some 
chief executives actually prefer this (Item 
18), and factionalism at the top is by no 
means rare (Item 25). Factor IV’ points to 
a common pattern or type of corporation 
which is relatively decentralized, a set of 
almost autonomous units under a central um- 
brella which may be little more than a finan- 
cial holding company. 

On the other hand, support is available 
here for the positions taken by Likert (1967) 
and McGregor (1960) in favor of participa- 
tive management. Involvement of all execu- 
tives (Item 4) and concern by the chief that 
all be satisfied (Item 10) are associated with 
high executive morale, satisfaction with the 
dm process, and profitability. The small, de- 
cision-making echelon at the top of a large 
corporation has some attributes in common 
with the small groups studied by experimental 
social psychologists, and with the committees 
studied by Collins and Guetzkow (1964). 
Communication patterns are important; and 
the efficiency of centralized structure, with 
concomitant loss of satisfaction for those on 
the periphery, seems to be involved in these 
results. 

Leadership is also important. The distinc- 
tion between “mediator” and “arbitrator” 
styles (Stagner, 1965) and the contrast in 
the present data between bilateral talks and 
wider executive participation may be related 
to the now traditional distinction between 
“consideration” and “structuring.” A chief 
executive who wants all affected executives 
to participate and be satisfied is certainly 
closer to the “consideration” pole than are 
those who disregard such matters. 

These data do not support the hypothesis 
that a vigorous personality may win a de- 
cision against opposition with a stronger 
power base. The social role, or control of or- 
ganizational power, seems generally more sig- 
nificant, This conclusion, of course, requires 


qualification. Most of the men contacted in 
the earlier study (Stagner, 1965) were judged 
to have vigorous, aggressive personalities. It 
can reasonably be assumed that respondents 
in this survey were similar. One does not be- 
come a VP in a large American corporation 
by passive-dependent behavior. Personality 
differences on this dimension therefore may 
have been relatively small, thus obscuring 
any significant trends. 

It is important to say a word about the 
problem of corporate goals. We must replace 
the concept of a single utility, profit maximi- 
zation, with the concept of multiple utilities. 
Industrial psychologists should recognize this 
as an extension of the problems encountered 
wth personnel test validation. Reliance on a 
single criterion measure, such as quantity of 
production, or incentive earnings, proved 
hopelessly oversimplified. According to re- 
search findings, personnel decisions should be 
based on multiple criteria with minimum 
cutting scores on a number of predictor vari- 
ables rather than a single regression equation. 
The personnel manager wants to hire work- 
ers who will have good absentee and tardiness 
records, will accept supervision, and will pro- 
duce at a high level in quality and quantity. 
There is a point on each of these beyond 
which the employee is unacceptable regardless 
of how good he is on other aspects of per- 
formance. Thus the manger accepts a “satis- 
ficing” solution with respect to several vari- 
ables. 

The problem of corporate goals must be 
handled in the same manner. Executive de- 
cisions must balance costs against customer 
goodwill, efficient production against union 
resistance, pricing policy against political re- 
percussions. The executive in such cases seeks 
a satisficing level for the other outcome mea- 
sures and then attempts to maximize profit. 
Various executives may be more interested in 
maximizing for their own division than for 
the firm. Inevitably, optimization becomes a 
delusion if taken literally. Corporate dm is 
guided by numerous values, only one of which 
is profit maximization. 
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STABILITY RATINGS AS CLASSIFIERS OF LIFE 
HISTORY ITEM RETEST RELIABILITY 


ALLAN R. STARRY ! anp I. van W, RAUBENHEIMER 


Measurement and Research Center, Purdue University 


AND ABRAHAM TESSER 


University of Georgia 


Investigations concerned the prediction of fakeability and stability of validated 
life history items. Two studies were conducted: (a) 26 judges rated 60 (Ques- 
tionnaire A) and 52 (Questionnaire B) discretely scored items for probable 
fakeability and stability, indexes of which were based on retest responses of 
321 undergraduates over 3 mo. Validity coefficients for the fakeability and 
stability ratings were .53 (Questionnaire A) and .50 (Questionnaire B), and .54 
(Questionnaire A) and .51 (Questionnaire B), respectively (p< .01). (b) 25 
judges rated 88 continuously scored items for probable response stability. Cor- 
relation with actual retest stability for 106 freshmen was 47 (p < .01). Little 
difference was found in predicting stability for discrete and continuous items. 
The Probable Response Stability scale as a potential classifier variable of life 
history items as well as potential difficulties and suggestions for future research 


are discussed. 


Current widespread use of life history items 
as predictors of academic and vocational cri- 
teria dictates that more information be ob- 
tained on the stability of responses to bio- 
graphical questionnaires. Typically such forms 
are item analyzed and validated on a concur- 
rent basis with samples of employees, students, 
etc., on whom criterion data are available. The 
assumption that responses to personal history 
items will remain stable over time is thus 
implicit when the scored questionnaire is 
subsequently utilized as a selection tool. Sev- 
eral investigators have demonstrated, how- 
ever, that relatively imperfect stability is a 
more reasonable expectation. Their studies 
indicate that item characteristics (Owens, 
Glennon, & Albright, 1962), criterion trans- 
parency (Klein & Owens, 1965), and objec- 
tionability (Larsen, Swarthout, & Wickert, 
1967) are related to retest reliability. This 
suggests the possibility of classifying items 
according to probable stability on the basis 
of these variables. As used in the present 
study, a classifier is a measure used to sepa- 
rate the components of a predictor(s)—in this 
case items in a biographical inventory—into 


1 Requests for reprints should be sent to A. R. 
Starry, Director, Measurement and Research Center, 
Purdue University, Lafayette, Indiana 47907. 
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categories of expected reliability. Classifier 
variables might provide a means of improving 
the predictive validity of life history forms 
on which a scoring system must be derived 
concurrently. In addition, given a large fixed 
item set serving as a multi-criteria/multi- 
population prediction device, classifiers could 
conceivably prove valuable in sorting out the 
items according to their probable stability in 
each situational context. 

Starry (1966), working with discrete data 
(binary scoring of each alternative), devel- 
oped a biographical item classification system 
based on social desirability indexes (Edwards, 
1957) for the prediction of item reliability 
and fakeability. Correlations of .31 and .45 
were obtained between social desirability 
ratings and test-retest stability and fake- 
ability response measures, respectively. Con- 
siderable shrinkage occurred when prediction 
weights developed on the 100-item experi- 
mental questionnaire were cross-validated on 
an equivalent form. A follow-up study con- 
ducted by Starry and Tesser (1967) using 
only those items in the questionnaires signifi- 
cantly related to a grade-point criterion pro- 
duced similar results. The utility of this ap- 
proach seemed to be limited to the prediction 
of item fakeability, and then only as a rela- 
tive measure within a particular item sample. 
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The social desirability scale used in these 
previous studies generalizes well to most dis- 
crete items, but loses relevance when applied 
to continuously scored biographical items or 
to cases in which one is interested in viewing 
a set of qualitative alternatives as a whole. 
It is meaningful to describe the social desir- 
ability of a particular response alternative, 
but not the social desirability of a set of 
qualitative or continuous alternatives. One 
could, however, talk about whether the set 
of alternatives as a whole would be more 
or less prone to social desirability response 
bias. In this case we would be dealing with 
the variance of the alternatives with respect 
to their social desirability values. One might 
argue that response stability is related in some 
way to the degree to which a set of response 
alternatives are varied on this dimension. 

The problem here, as indeed the problem 
with our earlier work (Starry, 1966; Starry 
& Tesser, 1967), lies in the fact that it may 
not go far enough. In the first place two dif- 
ferent approaches are available—one for a 
particular, discretely scored response alterna- 
tive and one for a set of alternatives. Second, 
it is possible that social desirability ratings per 
se may be too general for efficient prediction 
in some particular applied setting. A set of 
response alternatives may in general have a 
large social desirability component which 
may be totally irrelevant for the particular 
purpose for which the respondent is taking the 
instrument. For example, it may be more 
socially desirable, on a superficial level, for 
a respondent to be able to play some musical 
instrument than to have no ability in this 
area, but this same item could be totally ir- 
relevant if the respondent is taking the inven- 
tory in conjunction with a job application as 
a machinist. Or, with the system used in the 
discrete case, it is possible that what is gen- 
erally socially desirable is not socially desir- 
able for a particular setting or vice versa. To 
appear gregarious, in general socially desir- 
able, may be an unacceptable kind of response 
when one is applying to become an astronaut. 

There are many other sources of variance 
that affect response stability besides respon- 
dent attempts to make the appropriate re- 
sponse in a particular setting. For example, 
an item may ask for information that a 


respondent doesn’t have, it may make finer 
discriminations than the respondent can 
handle, it may ask for information that is 
dimly remembered, or it may be sensitive 
to the respondent’s mood at a_ particular 
time, etc. 

A scale called Probable Response Stability 
(PRS) was developed in an attempt to find 
a straightforward measure which would be 
sensitive to the issues discussed above. That 
is, it should be useful for looking at discrete 
items as well as sets of alternatives. In addi- 
tion it should reflect not only differences 
in situation-specific social desirability but 
other sources of response stability variance 
as well. 

The assumption underlying this scale is that 
external raters armed with information about 
the situational context, purpose of the instru- 
ment, and respondent population can make 
judgments about how stable the responses to 
the items will be. If this rationale has any 
merit one would expect two outcomes: First, 
that the judgments made by raters will be 
reliable (indicating that there is something in 
the task that can be uncovered) and second 
that the PRS scale will make a significant 
contribution to the prediction of response 
stability. 

The investigations reported here are at- 
tempts to evaluate the PRS scale as a poten- 
tial classifier variable. Methodological con- 
siderations in the use of classifiers and empiri- 
cal evaluations of their usefulness in personal 
history research must be the subject of future 
explorations. Two sets of data were used in 
the following studies: one set based upon 
discrete (binary) scoring of biographical data 
and the other on continuously scored items. 
The analysis of discrete items will be referred 
to as Study I and the analysis of continuous 
items as Study II. 


Stupy I 
Method 


The method used to collect reliability and fake- 
ability measures on discrete items has been described 
previously in detail (Starry, 1966; Starry & Tesser, 
1967). In summary, two 100-item questionnaires were 
administered to a sample of 321 undergraduate 
males. Half of Ss who received Form A and half 
of Ss who received Form B were instructed to 
respond to the items in such a way as to make 
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themselves appear maximally attractive to persons 
using this instrument as a basis for admitting stu- 
dents into graduate school (fake set). The remaining 
Ss were asked to respond honestly to the items 
(honest set). Approximately 3 mo. later, the instru- 
ments were readministered to all Ss under the honest 
set. Item analysis was conducted on both question- 
naire forms using a criterion of college grade point 
average, resulting in 60 discriminating items for 
Form A and 52 for Form B. The item stability 
measures (i.e., reliability and fakeability) were gener- 
ated by calculating the percentage of consistent 
responses to each item’s discriminating alternative 
on the two administrations. 

To obtain the item classification information for 
the present study, discriminating alternatives in both 
biographical forms were rated by 26 advanced under- 
graduates with the PRS scale shown below. Judges 
were instructed to read the entire questionnaire be- 
fore assigning ratings and to consider each designated 
alternative within the complete item context. 


Probable Response Stability Scale 
1 2 3 + 5 6 7 8 9 


High 
Stability 


Moderate 
Stability 


Low 
Stability 


The rating task is to judge, on this 9-point 
scale, the extent to which designated responses 
to certain items on the questionnaire are 
likely to remain stable over a 3-mo. time 


interval. 
They were also informed that the respondent 
population consisted of male undergraduates. A 


sample rating with a verbal description of the logic 
which had been used in arriving at a given stability 
value was included in the instructions. Mean stability 
ratings of the 112 discriminating alternatives ranged 
from 4.58 to 8.35. 


Results 


Estimated reliability of the mean ratings 
was .85 for Form A and .86 for Form B 


TABLE 1 


CORRELATIONS BETWEEN MEAN PRS RaArTINGS 
AND ITEM STABILITY INDEXES FOR 
DIscRETELY ScoRED ITEMS 





Probable response stability 





ratings 
Item index 
Form A Form B 
(60 items) (52 items) 
Reliability .54* Ole 
Fakeability oe Us 





*p<01, 


items (Winer, 1962, p. 126). These results are 
summarized in Table 1. Validity coefficients 
for the ratings were in the low .50’s, indicating 
a moderate degree of overlap between PRS 
and actual stability and fakeability. 

Total score reliability (test-retest) dropped 
from .79 to .50 (Form A) and from .78 to .47 
(Form B) when the fake set was introduced. 
There was some tendency for those items on 
which faking took place to be less reliable. 
Stability of items under honest test-retest 
conditions correlated .63 and .67 with that 
obtained under the influence of this fake set. 


Stupy II 
Method 


During orientation week in September, 1967, 
Purdue freshmen completed a 671-item biographical 
questionnaire constructed for some other research 
purposes. From this instrument the authors selected 
88 five-alternative, continuously scored items repre- 
sentative of content areas sampled by the complete 
questionnaire. Although question topics overlapped 
in some areas, none of the items had appeared in the 
questionnaire used in Study I. These items were 
then readministered 16-20 wk. later to a volunteer 
group of 106 male and female freshmen. The Ss were 
instructed to respond as accurately as possible, with 
no fake set being introduced. 

A stability index for each item was obtained by 
computing the absolute difference between responses 
given by each S on the two administrations, sum- 
ming across all Ss, and dividing by the number re- 
sponding to that item on both administrations (V 
per item varied from 104 to 106). In order to express 
the stability index and PRS ratings in more similar 
terms, this quantity was then subtracted from a con- 
stant of 9.00. The stability index ranged from 7.86 
for the last stable item in this questionnaire to 8.89 
for the most stable. 

The 88 items were rated for stability by 25 male 
and female graduate students with the PRS scale 
used in Study I. Instructions were modified to 
accommodate the different item type. Judges were 
instructed to assign probable stability values to the 
total item instead of to a single designated item 
alternative as was the case with discrete scoring. 
Mean PRS ranged from 5.44 to 9.00. 


Results 


Reliability of the mean PRS ratings, com- 
puted as before, was estimated to be .97. The 
product moment correlation between item 
stability and mean PRS values across the 88 
items was .47, significant at the .01 level. 
The correlation between mean rating and 
item stability index standard deviation was 


STABILITY OF Lirr History ITEMS 17 


calculated as a check on the interpretation 
of the rating task. The lack of association be- 
tween these variables (r= —.03) tends to 
indicate that judges were not attempting to 
rate the probable stability of responses be- 
tween (as opposed to within) the respondents. 


DISCUSSION 


The PRS rating scale seems to offer the 
universality necessary for rating both item 
types with acceptable interrater reliability, 
although its effective range was less than 
four scale points in both cases. 

Little practical difference was found in these 
studies between the predictability of test- 
retest item stability for discrete or continu- 
ous type biographical items. Although the 
fakeability and stability indexes for the same 
discrete items were rather dissimilar, pre- 
dictability was virtually the same for both. 
External judges were able to account for 
approximately 25% of the criterion variance 
in either item type. 

While stability ratings would seem to offer 
some practical utility as a classifier variable 
in applied prediction studies with biographical 
items, several problems must first be overcome. 
Perhaps the most serious concerns the shape 
of the bivariate distribution of rated (PRS 
scale) versus empirical stability. Although the 
scatter-plots produced in these studies dis- 
played a fair degree of linear regression, 
homoscedasticity was weak. In particular, 
prediction appeared best at the ‘High Stable” 
end of the scale. Unless this phenomenon is 
merely sampling fluctuation, elimination of 
some of the lowest rated items in a question- 
naire would not result in the increase to 
total score stability which coefficients around 
.5O0 would suggest. To the researcher who 
could afford it, deletion of a large percentage 
of the lower rated items would probably in- 
crease the average stability of remaining 
items, but total score stability may not im- 
prove because of the effects of shortening the 
instrument. Increased emphasis on the at- 
tainment of normal distributions during con- 
struction of classifiers and the biographical 
instruments themselves (Starry, 1968) seems 
advisable. In addition, the relationship be- 
tween item reliability and validity is almost 
certain to be partially dependent on the 


criterion under investigation. It is conceivable 
that in certain prediction settings items 
judged to be relatively less stable could actu- 
ally be the most valid. Their elimination 
might seriously impair the validity of the 
biographical form under investigation. The 
solution to both these problems would appear 
to be some effective system for the predic- 
tion of stability which takes item validity 
into account and gives the researcher more 
assurance that the items he deletes because 
of low rated stability are actually too un- 
stable for inclusion in his selection instrument. 
Perhaps a combination of classifier informa- 
tion used in conjunction with routine con- 
current item analysis statistics will provide 
such a system. 

The authors have accumulated some evi- 
dence in pilot studies conducted during devel- 
opment of the PRS scale that rater knowledge 
of respondent and criterion characteristics 
does not enhance their ability to predict item 
stability, suggesting some degree of scale in- 
variance. However, the extent to which item 
ratings might be invariant with respect to the 
apparent range of stability peculiar to the 
set of items of which they are a part is 
unknown. An item rated as moderately stable 
in one context might be rated highly stable 
in another. Until this matter is resolved and 
rated values associated with predicted ranges 
of empirical stability, the researcher working 
with a particular item set will find it difficult 
to formulate standards and cutoff points for 
item deletion. 

Total score test-retest reliability coefficients 
in the high .70’s for the 52 and 60 item dis- 
cretely scored questionnaires, while perhaps 
not as large as might be desired, are reason- 
able in lieu of the uncontrollable sources of 
variation encountered with “volunteer” college 
student respondents. It is this lack of control 
which may account for the low reliability 
of .60 obtained with the 88-item continuously 
scored questionnaire. Although every effort 
was made to elicit accurate information on 
both administrations, the length of the origi- 
nal questionnaire of which these items were 
a part undoubtedly resulted in carelessness 
on the part of some students. Low reliability 
in Study II might also be related to the 
specific respondent sample which was com- 
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posed of first semester college freshmen, a 
group highly subject to changes in value 
systems and perceptions. All the questionnaire 
reliability coefficients reported here should 
probably be viewed as lower-bound estimates. 
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EFFECTS ON NEGRO AND WHITE TEST PERFORMANCES? 
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This research investigated the hypotheses that (a) extra pretest practice, (b) 
extra testing time, and (c) extra practice and extra testing time would improve 
the mental ability test performances of Negroes more than whites. The Ss, 
Negro and white high school students in the higher and lower socioeconomic 
classes, were administered parallel forms of several ability tests. Some Ss took 
the tests under speeded conditions, others under power conditions. Although 
both races and both socioeconomic classes improved their performances as the 
testing procedures became more lenient, all groups profited to a comparable 


extent; 


the three hypotheses were rejected. Implications are that the testing 


procedure itself does not discriminate between racial groups nor between 
culturally advantaged and disadvantaged Ss. 


The question of whether ability tests un- 
fairly discriminate against minority groups is 
of great concern to psychology, education, 
and industry. Researchers interested in this 
problem have generally focused their atten- 
tion on two aspects of discrimination, that is, 
test content and analysis of test results. An- 
other potential aspect of unfair discrimination 
involves the testing procedure itself. Since it 
is conceivable that certain testing conditions 
systematically favor one cultural group over 
another, variables such as test administra- 
tor’s race, test directions, methods of respond- 
ing, testing time, and amount of pretest 
practice need more attention. 

The purpose of the present study was to 
determine if highly speeded tests are equally 
fair to Negroes and whites and, as other 
research indicates (Boger, 1952; Eagleson, 
1937; Katzenmeyer, 1962; Klineberg, 1928; 
Vane & Kessler, 1964), extra practice or test 
‘amiliarity would reduce the Negro-white test 
score discrepancies. It was hypothesized that 


1 This study is based on a thesis submitted by the 
enior author to the Department of Psychology, 
Jniversity of Houston, in partial fulfillment of the 
equirements for the degree of Doctor of Philosophy. 

2Now with Lifson, Wilson, Ferguson & Winick, 
mc., Management Consultants, 1811 Crawford, Hous- 
on, Texas 77002. Requests for reprints should be sent 
o Jerry D. Dubin at the above address. 
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Negroes would benefit more than whites when 
opportunities were available for (a) extra pre- 
test practice, (D) extra testing time, and (c) 
both extra practice and extra testing time. 


METHOD 
Subjects 


Two hundred and thirty-five Negro students from 
a predominately Negro high school and a random 
sample of 232 white students from a predominately 
white high school in the same school system served 
as Ss. The Negro and white students were divided 
into four groups: S1, $2, Pl, and P2. Groups S 
were administered speeded tests, whereas Groups P 
received power tests; Groups 1 were comprised of 
students in the ninth and tenth grades, while Groups 
2 were eleventh and twelfth graders. Since both 
schools were within the progressive Galena Park 
(Texas) School District, all Ss were quite familiar 
with standardized tests. 


Ability Tests 


Forms A and B of four Employee Aptitude Survey 
(EAS) tests were used: Numerical Ability (solving 
computational problems), Space Visualization (count- 
ing three-dimensional blocks), Numerical Reasoning 
(solving number series items), and Verbal Reasoning 
(drawing valid conclusions from a list of facts). 
Both forms for each test were constructed in a 
parallel manner and are statistically equivalent 
(Ruch and Ruch, 1963). 


Procedure 


Before testing began, Ss were instructed that each 
test had two parts (actually the alternate forms) 
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TABLE 1 


DISTRIBUTION OF THE SAMPLE BY GRADE LEVEL, 
RAcE, SOCIOECONOMIC STATUS, AND TESTING 
CoNDITIONS 





Sample size 





Ss Timed Untimed Matted 
admini- adminis- : 
stration stration paite 

9th-10th graders 
Negro 
Low SES 24 29 14 
High SES 33 18 14 
White 
Low SES 28 32 25 
High SES 29 26 25 
Total 114 105 78 
11th-12th graders 
Negro 
Low SES 44 38 ou 
High SES 26 23 10 
White 
Low SES 31 33 31 
High SES Dil 26 24 
Total 128 120 97 





to be taken. consecutively. Negroes and whites in 
Group S1 were administered both forms of the 
“speeded” Numerical Reasoning and Space Visualiza- 
tion Tests with the regular 5-min. time limits; Group 
Pl was administered both forms of the “power” 
Numerical Reasoning Test with tripled time limits. 
Similarly, Group S2 was administered both forms 
of the Verbal Reasoning and Numerical Ability Tests 
with the regular 5- and 10-min. time limits and 
Group P2 was administered both forms of the Verbal 
Reasoning Test with tripled time limits. In an at- 
tempt to reduce the effects of the test administrator’s 
race (all tests were administered by whites), Negro 
teachers assisted in the testing of Negro students. 
Upon finishing the tests Ss were asked to complete 
a short socioeconomic questionnaire. Based on four 
items (father’s occupation, father’s education, 
mother’s education, and student’s educational expec- 
tancies), a socioeconomic status (SES) index was 
derived. Dichotomizing the index for both racial 
groups resulted in four categories: high SES whites 
(HW); low SES whites (LW), high SES Negroes 
(HN), and low SES Negroes (LN). See Columns 
1 and 2 in Table 1 for sample distributions. The 
tests and questionnaires were administered during a 
2-day session in December 1967. 


RESULTS 
Raw Test Scores 


Figure 1 illustrates a relationship between 
test performance and racial-socioeconomic fac- 


tors. Typically, on each test for each condi- 
tion, the whites outperformed the Negroes, 
and the culturally advantaged students out- 
performed the culturally disadvantaged (Table 
2). With one minor exception, an order was 
maintained; the high SES whites performed 
best followed by the LW, HN, and LN groups. 
(Somewhat similar results were found by 
Fifer, 1964.) 

In another expected finding, Figure 1 
demonstrates that all Ss attained higher test 
scores as the testing situation became more 
lenient. Mean test scores increased con- 
sistently as the procedure progressed from 
speeded tests—no practice, to speeded tests— 
practice, to power tests—no practice, to power 
tests—practice. This steady improvement was 
found for both racial groups and both SES 
groups. The improvements obtained by Ne- 
groes and whites are compared below, sepa- 
rately for each hypothesis. 


Hypothesis 1 


In analyzing the hypothesis that extra prac- 
tice would be more advantageous to Negro 
than to white Ss, the differences between 
Forms A and B for each test were analyzed 
by the double classification analysis of vari- 
ance model.* In six investigations of practice 
effects there were no significant differences 
attributable to race (Table 3). Consequently, — 
the extra practice equally enhanced Negro- 
and white test performances, and Hypothe-— 
sis 1 was clearly refuted. | 

Furthermore, Table 3 shows that extra 
practice profits both socioeconomic classes to 
a comparable degree. (The one exception is” 
probably due to a sampling error, see Figure 
la.) Unexpectedly, the Race X SES interac- 
tion was significant for two of the four 
speeded tests. This suggests that the low SES — 
whites and high SES Negroes improve most | 
when extra practice is provided on certain 
types of speeded tests. 


Hypothesis 2 


To investigate the hypothesis that extra 
testing time would favor Negro students, Ss 
taking speeded tests (Groups S1 or S2) were 

8 Because the cell sizes were unequal the sum of 


squares for each analysis was estimated by the least- 
squares solution (see Winer, 1962). 
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Fic. 1. Average score for each test by race, socioeconomic status, and testing conditions. 


paired with Ss taking power tests (Groups P1 
or P2). Pairings were controlled by matching 
Ss for race, sex, grade level, and SES index. 
(Because of this restrictive matching pro- 
cedure, the sample was substantially reduced, 
see Column 3 in Table 1.) Hypothesis 2 was 
then tested by comparing the speed-power 
test differences between the matched Negroes 
with the differences between the matched 
whites. 

As shown in Table 3, Negro and white 
improvement scores do not significantly dif- 
fer. Hypothesis 2 was therefore rejected. 
Similarly, socioeconomic status and the Race 
x SES interaction were unrelated to im- 
provements resulting from extra testing time. 


Consequently, administering tests without 
time limits does not favor Negroes or 
culturally disadvantaged students. 


Hypothesis 3 


The third hypothesis, that Negroes will 
benefit most from the combination of extra 
practice and extra time, also was tested by 
matching Ss. Using the same matched pairs 
as used for testing the previous hypothesis, 
the difference scores (in this case Form B 
untimed minus Form A timed) were calcu- 
lated and analyzed. Again, the hypothesis was 
rejected. Neither race, socioeconomic status, 
nor the Race X SES interaction was signifi- 
cantly related to score improvements resulting 
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from extra practice and extra testing time 
(Table 3). 


DISCUSSION AND CONCLUSIONS 


The results failed to support the hypotheses 
that Negroes would be favored by extra prac- 
tice and/or extra testing time. Apparently, 
the administration of highly speeded tests 
given without extra practice did not handicap 
the average Negro S nor Ss in lower socio- 
economic classes. In a more general sense the 
results imply that the testing procedure 
itself was not a major factor in discriminating 
between Negro and white Ss or between 
culturally advantaged and culturally disad- 
vantaged Ss. 

Although the main effects of race and SES 
were not significant, the Race X SES interac- 
tion was significant for two of the four timed 
tests. On these tests, Verbal Reasoning and 
Space Visualization, the low SES whites and 
high SES Negroes improved more when given 
extra practice than did the high SES whites 
and low SES Negroes. Neither of the un- 
speeded tests showed a significant interaction 
effect. Why the low SES whites and high 
SES Negroes benefited more from extra prac- 
tice and why the interactions were significant 
only for particular tests and only under 
speeded conditions is open to speculation. 

One possible explanation is that Ss in the 
low SES white and high SES Negro groups 
do not function near their capacity in re- 


TABLE 2 


SUMMARY OF F Ratios FoR Two-way ANALYSIS 
OF VARIANCES UsInGc Test ScorE DATA 








Source of variance 











Test/treatment df 
Race SES Race X SES 
Numerical reason- 
ing 

Timed A 1/110 | 20.638** | 0.374 0.325 

Timed B 1/110 | 28.422** | 5.832* 0.876 

Untimed A 1/101 | 14.371** | 3.728 1.001 

Untimed B 1/101 9.499% 1.897 0.957 
Space visualization 

imed 1/110 | 49.783** 1.790 0.335 

Timed B 1/110 | 38.894** | 0.560 3.728 
Verbal reasoning 

Time 1/124 | 23.043** | 5.230* 0.076 

Timed B 1/124 | 30.4354** | 8.670** 2.646 

Untimed A 1/116 | 23.649** | 12.980** 3.069 

Untimed B 1/116 | 24.548** | 9.764** 0.435 
Numerical ability 

Timed A 1/124 | 25.834** | 9.531%* 3.938* 

Timed B 1/124 | 27.004** | 7.383** 2.391 

*p < 05 
>) < 01 


TABLE 3 


ANALYSIS OF VARIANCE SUMMARIES 
FOR DIFFERENCE SCORES 





Source of variance 





Tests® df 


SES | Race X SES 





| Race 





Hypothesis 1: Extra practice (Form B minus Form A) 





NR speeded | 1/110 | 1.313 | 6.153* 0.258 
SV speeded | 1/110 | 2.218 | 0.950 6.792* 
VR speeded | 1/124 | 1.236 | 1.059 4.350* 
NA speeded | 1/124 | 1.501 | 0.285 0.852 
NR power | 1/101 | 0.569 | 0.393 0.000 
VR power 1/116 | 0.304 | 1.185 2.684 


Hypothesis 2: Extra time (untimed minus timed) 





NR Form A | 1/74 | 2.601 | 1.385 0.786 
NR Form B | 1/74 | 0.058 | 0.047 0.361 
VR Form A | 1/93 | 0.437 | 0.160 0.516 
VR Form B | 1/93 | 0.504 | 0.153 0.499 





Hypothesis 3: Extra practice and time 
(untimed Form B minus timed Form A) 








0.658 
0.037 


0.198 
0.003 


NR 
VR 


1/74 
1/93 


1.005 
0.009 














8 NR = Numerical Reasoning; SV = Space Visualization; 
VE = cepa Reasoning; NA = Numerical Ability. 
b<.05. 


stricted situations, that is, when examined on 
uncommon tasks* or tested under highly 
speeded conditions. When provided with extra 
pretest practice, therefore, these Ss work more 
effectively and increase their test scores 
substantially. The low SES Negroes, the most 
culturally deprived Ss, are also functioning 
below their potential. Their skills or test- 
taking abilities, however, are so underdevel- 
oped that a single practice test is not suf- 
ficient to allow them to master the unfamiliar 
tasks and the highly speeded conditions. On 
the other hand, the most culturally advan- 
taged Ss (ie., the high SES whites) have 
broader experiences and, hence, are better 
prepared for the tasks. Consequently, they 
work rather well on the first administration 
and only improve to a moderate extent when 
provided with extra practice. 

4 Because of the school district’s testing program, 
all students were quite familiar with the two numeri- 


cal tests but unfamiliar with the spatial and verbal 
tests. 
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The findings of this research must be con- 
sidered with respect to the study’s limitations. 
One of the major limitations was the need to 
match Ss taking speeded and power tests. 
The insignificant results for Hypotheses 2 
and 3, therefore, could be related to the 
matching procedure. A second limitation is 
the fact that Ss were a homogeneous group 
of high school students experienced with 
standardized testing. It is not known if the 
results are generalizable to Ss less experi- 
enced in test taking, to older Ss, or to 
industrial populations. A third limitation was 
the use of only one practice test. Remaining 
to be tested are the effects of a series of 
practice tests. 

With the above limitations noted, the most 
important implication of this research is that 
speeded tests do not handicap Negro Ss nor 
are they likely to handicap the future Negro 
job applicant. 


REFERENCES 


Bocer, J. H. An experimental study of the effects of 
perceptual training on group IQ test scores of 


elementary pupils in rural ungraded schools. 
Journal of Educational Research, 1952, 46, 42-52. 

EActeson, O. W. Comparative studies of white and 
Negro subjects in learning to discriminate visual 
magnitude. Journal of Psychology, 1937, 4, 167- 
197. 

Frrer, G. Social class and cultural group differences 
in diverse mental abilities. Proceedings of the 
invitational conference on _ testing problems. 
Princeton: Educational Testing Service, 1964, 107— 
itz, 

KATZENMEYER, W. G. Social interaction and dif- 
ferences in intelligence test performance of Negro 
and white elementary school pupils. Unpublished 
doctoral dissertation, Duke University, 1962. 

Kurveserc, O. An experimental study of speed and 
other factors in racial differences. Archives of Psy- 
chology, 1928, 15, 109-122. 

Rucu, F. L., & Rucu, W. W. Technical report: 
Employee aptitude survey. Los Angeles: Psycho- 
logical Services, Inc., 1963. 

VanE, J. R., & Kesster, R. T. The Goodenough 
Draw-A-Man Test: Long term reliability and 
validity. Journal of Clinical Psychology, 1964, 20, 
487-488. 

Winer, B. J. Statistical principles in experimental 
design. New York: McGraw-Hill, 1962. 


(Early publication received August 19, 1968) 


Journal of Applied ever tees, 
1969, Vol. 53, No. 1, 24-34 


CONTRIBUTIONS OF THE INTERVIEW TO ASSESSMENT 


OF MANAGEMENT POTENTIAL 


DONALD L. GRANT! anp DOUGLAS W. BRAY 
American Telephone and Telegraph Company 


The contribution of interview information to assessment center evaluations and 
the relationship of interview variables to progress in management are presented. 
The interview data were obtained by coding interview reports. Analyses of the 
data clearly indicate that information from the interview reports contributes 
to assessment center evaluations. Judgments of career motivation and to a 
lesser extent work motivation and control of feelings appear to have been influ- 
enced by the interview information. In addition, judgments of interpersonal 
skills were reinforced, if not influenced, by the interview reports. The results 
of the study also demonstrate that extensive and reliable information on many 
personal characteristics can be obtained from the interview. In addition, several 
of the interview variables, especially those reflecting career motivation, depen- 
dency needs, work motivation, and interpersonal skills are directly related to 
progress in management. The findings clearly indicate that relevant information 
on personal characteristics important to managerial success was obtained from 


interview reports. 


The Bell System Management Progress 
Study, a longitudinal investigation of the 
development of young men in a business man- 
agement environment (Bray, 1964), provides 
the opportunity for a thorough investigation 
of assessment center procedures. An impor- 
tant part of such an investigation is a study 
of the contributions of the various assessment 
techniques to assessment staff judgments. The 
center used in the study has been described 
and the results of analyses for most of the 
assessment techniques presented (Bray & 
Grant, 1966; Grant, Katkovsky & Bray, 
1967). In addition to examining the relation- 
ships of the techniques to assessment staff 
judgments, correlations of each technique and 
of staff judgments to a progress criterion have 
been reported. 

A major omission from the previous analy- 
ses has been the assessment interview. The 
present article presents information on the 
contributions made by the interview to the 
assessment center process and relationships of 
interview variables to the progress criterion. 

The interview is, of course, the most widely 
used method of evaluating candidates for em- 
ployment, including candidates for manage- 


1 Requests for reprints should be sent to Donald L. 
Grant, Personnel Manager, Research, American Tele- 
phone and Telegraph Company, 195 Broadway, 
Room 2122, New York, New York 10007. 
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ment positions. Research evidence in support 
of assessing people by means of the interview, 
however, is limited. Two recent reviews of 
research on the selection interview (Mayfield, 
1964; Ulrich & Trumbo, 1965) raise ques- 
tions concerning the technique. Mayfield 
(1964) notes that data supporting the selec- 


tion interview are not substantial. He par- 


ticularly questions the consistency of materials 
covered and inter-rater reliability in the un- 
structured interview. He concludes, moreover, 
that even where the reliabilities of the selec- 
tion interview, structured or unstructured, 
are high, “the validities obtained are usually 
of a low magnitude” (Mayfield, 1964, p. 
251). In addition, he declares that the only 
characteristic which can be estimated reliably 
and validly from interviews is that of mental 
ability. 

Ulrich and Trumbo (1965) also found data 
favoring structured over unstructured inter- 
views in selection. The structured interviews 
have proved more valid. In contrast to 
Mayfield, however, they conclude that the 
interviewer can most validly assess the areas 
of personal relations and motivation to work 
(Ulrich & Trumbo, 1965, p. 113). 

In addition to its widespread use in person- 
nel selection the interview has been an integral 
part of assessment center procedures. In gen- 
eral, however, its contributions to the assess- 
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ment process, along with those of other 
assessment center techniques, have not been 
studied. A partial exception to this generaliza- 
tion is the assessment center activity of the 
Office of Strategic Services in World War II 
(OSS Assessment Staff, 1948). In their centers 
the interview was considered the most impor- 
tant single procedure and the interviewer’s 
rating had the most influence on the final 
decision by the assessment staff concerning 
a candidate’s acceptability. The interviewer’s 
rating, however, was not based solely on the 
interview. Prior to the interview the inter- 
viewer was furnished with considerable in- 
formation on the candidate (personal history 
record, health inventory, projective question- 
naire, etc.). He also had the opportunity to 
observe the candidate in many situations. As 
a consequence, the impact of the interview 
per se could not be ascertained. 

The predictive value of the interview when 
used in assessment center or analogous activi- 
ites has been reported by a few investigators. 
Kelly and Fiske (1951) report that the inter- 
view added little to the prediction of subse- 
quent performance by clinical psychologists. 
MacKinnon (1958) presents correlations be- 
tween interviewer ratings and various criteria 
of the effectiveness of Air Force officers. Many 
are statistically insignificant and the remain- 
der low (i.e., in the .20s). In a relatively de- 
tailed report Prien (1962) presents data 
showing rather low reliabilities between in- 
terviewers’ ratings, statistically insignificant 
correlations between interviewers’ ratings and 
supervisors’ ratings of the performance of 
sales candidates, and several statistically re- 
liable correlations between the interviewers’ 
ratings and supervisors’ ratings of perform- 
ance of candidates for managerial and tech- 
nical positions. 

The data from the Bell System Manage- 
ment Progress Study assessment center inter- 
views have been examined to ascertain their 
relationships to assessment staff judgments 
which have been shown to be predictive of 
progress in management; and the variables 
from the interviews have been related to a 
progress criterion. Neither the interview data 
nor any other assessment information have 
been available to Bell System management or 


to Ss. There is, therefore, no contamination 
of the progress criterion. 


METHOD 
Interviewing 


The interviews in the Management Progress Study 
assessment centers are relatively unstructured. Prior 
to the interview the interviewer is furnished with a 
completed personal history record on S which he 
reviews for essential biographical information and 
for areas in which to probe during the interview 
(e.g., relationships with parents and siblings). The 
interviewer also is instructed to cover a number of 
topics during the interview such as work goals, 
attitudes on social issues, and hobbies. 

The interview is conducted so as to insure privacy. 
Each S is asked by the interviewer for permission to 
take notes and is assured of the confidentiality of 
anything he says. None of the interviews has been 
recorded. The interviews are relatively informal, the 
interviewer being free to follow leads as they develop 
in the interview and to vary his style of interviewing 
(directive or nondirective) in accord with circum- 
stances. 

The assessment schedule allots 2 hr. to the inter- 
view, though it may be terminated in less than the 
scheduled time if in the judgment of the interviewer 
all pertinent topics have been covered adequately. 
Upon completion of the interview S is reassured of 
the confidentiality of the information, a report to 
be made to the assessment staff only. 

Immediately following each interview the inter- 
viewer, using a dictating machine, dictates a report 
from his notes. He is not asked to rate S at this 
time. The recorded interview report is played to the 
assessment staff at the evaluation meeting. At this 
time the interviewer, as a member of the staff, par- 
ticipates in evaluating S after the evidence from all 
the assessment techniques have been reported (Bray 
& Grant, 1966). At a later date the interview report 
is transcribed and filed with all the assessment data 
concerning S. 

In general the assessment center interviewers are 
professional psychologists. The interviews analyzed 
for this article were conducted by six persons, five 
of whom are psychologists.2 A majority of the 
interviews were conducted by two of the five 
psychologists. 


Analysis 


The Ss for this investigation of the interview are 
348 men who are participating in the Management 
Progress Study. Of these, 200 had graduated from 
college prior to Bell System employment while 148 
had not been employed as college graduates but had 
been promoted to management early in their careers. 
The great majority of Ss were in their 20s when 


2The interviews were conducted by Warren D. 
Bachelis, David E. Berlew, Donald C. Dewar, John 
J. Hopkins, David B. Muirhead, and Joseph F. 
Rychlak. 
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originally assessed; a few of the noncollege Ss were 
in their early 30s. The Ss came from several regions 
of the country. 

For purposes of analysis each interview report was 
coded independently by two advanced graduate stu- 
dents majoring in psychology who had not partici- 
pated in the assessment. Using a manual the coders 
rated each report on 18 variables. The variables were 
selected from the 25 variables rated by the assessment 
center staffs in making their evaluations of Ss (Bray 
& Grant, 1966). After reviewing several interview 
reports it was judged likely that information perti- 
nent to these variables had been obtained by the 
interviewers. The variables and their definitions are 


Personal Impact—Forcefulness 
How forceful an early impression does he make? 
Consider the impression he made on the inter- 
viewer. 
Oral Communications Skills 
How effectively does he express himself? Con- 
sider ease of expression, correct use of English, 
vocabulary, precision in explaining views, vocal 
clarity, and tonal quality. 
Human Relations Skills 
How well can this man get people to perform 
effectively by good human relations techniques? 
(His sincerity is irrelevant.) 
Personal Impact—Likability 
How likable an early impression does he make? 
Consider the impression he made on the inter- 
viewer. Did the interviewer tend to like or dislike 
him ? 
Behavior Flexibility 
How readily can he, when motivated, modify 
his behavior to reach a goal? Consider tendencies 
to persevere and frequency with which he has 
adapted to changing circumstances. 
Need Approval of Superiors 
To what extent does he seek approval of persons 
in authority over him? Consider his dependence 
on superordinates for help and guidance as well 
as tendencies to solicit praise and support from 
them. 
Need Approval of Peers 
To what extent does he seek approval of his 
peers? Consider his dependence on his coordinates 
for help and guidance as well as tendencies to 
solicit support from them. 
Tolerance of Uncertainty 
To what extent will his work performance stand 
up under uncertain or unstructured conditions? 
Consider his need for structure and the impact of 
lack of structure on his behavior. 
Inner Work Standards 
To what extent will he want to do a good job 
even if a less good one is acceptable to his boss 
and others? Consider the quality of results he 
expects of himself and of others (e.g., subordi- 
nates). 


’The interview reports were coded by Byron 
Fiman and Virginia Ellen Schein, 


Primacy of Work 
To what extent will he find satisfactions from 
work more important than satisfactions from 
other areas of life? Consider the value he places 
on work, the satisfactions he obtains from it rela- 
tive to other satisfactions (e.g., family, hobbies, 
community activities) and his willingness to devote 
more than the required time to his job. 
Energy 
How continuously can he sustain a high level 
of work activity? Consider his general activity 
level, the effort he puts into his work, and his 
reactions to expending energy (e.g., evidence of 
fatigue). 
Goal Flexibility 
To what extent will he be able to change his life 
goals (such as money, power, fame, etc.) in accord- 
ance with reality opportunities? Consider what he 
says are his goals and his commitment to them. 
Need Advancement 
To what extent will he need to be promoted 
significantly earlier than his peers in a job? Con- 
sider the level he aspires to and the rapidity with 
which he expects to achieve it. 
Need Security 
To what extent does this man need a secure job 
(not necessarily with the Bell System) ? Consider 
his motives in accepting a position in the Bell 
System, his views about leaving the System, and 
his views regarding alternative employment. 
Social Objectivity 
How free is he from prejudices against racial, 
ethnic, socioeconomic, educational, and other kinds 
of groups? Consider strength and inclusiveness of 
prejudice. 
Bell System Value Orientation 
To what extent is he likely to incorporate early 
in his career Bell System values such as service, 
friendliness, justice of company position on rate 
increases, etc. Consider his identification with the 
System, including his desire to remain in it despite 
possible disappointment of his persona] goals. 
Ability to Delay Gratification 
To what extent will this man be able to work 
over long periods of time without great rewards 
in order to reach later rewards? Consider his 
tolerance for frustration, patience, and the long- 
range vs. short-range nature of his goals. 
Range of Interests 
To what extent is he interested in a variety 
of fields of human activity such as science, politics, 
sports, music, art, etc? Consider his leisure time 
activities, hobbies, reading habits, community 
activities, etc. 


A 5-point scale was used in rating each variable. 
As an example the scale for Oral Communications 
Skills is shown below: 


1. Expresses himself very poorly. 

2. Expresses himself rather poorly. 

3. Expresses himself well in some ways, poorly in 
others, 
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4. Expresses himself well. 
5. Expresses himself very well. 


The coders were instructed to read each interview 
report and to note, using abbreviations for the vari- 
ables in the margin, information pertinent to evalu- 
ating the variables. After reviewing the evidence for 
a given variable the coders, using the scale provided, 
recorded a rating. The coders were further instructed 
to omit ratings on variables for which the evidence 
was inadequate and, as far as possible, to avoid the 
error of central tendency in making their ratings. 

The first step in analyzing the data was to group 
the coded reports according to the educational back- 
grounds of Ss (i.e., college graduates and noncollege) 
at time of employment. For each variable the per- 
centage of reports rated by both coders in each 
sample was computed. The reliability of the coding 
procedure was determined by correlating the ratings 
of the two coders (omitting ratings not recorded 
by both) on each variable for each sample of Ss. 
The Spearman-Brown prophecy formula then was 
applied to the correlations. 

The ratings of the coders (using only those 
recorded by both) were pooled by simple addition 
and from the resulting sum scores the means, 
standard deviations, variances, and intercorrelations 
of the variables computed. The sum scores also were 
correlated with the judgments of the assessment staff, 
scores from other assessment techniques, and progress 
in management as reflected by a salary criterion. 
In essence, the analyses made are parallel to those 
for the other assessment techniques previously 
studied (Bray & Grant, 1966; Grant, Katkovsky, & 
Bray, 1967). 


RESULTS 


Table 1 shows for each variable the number 
and percentage of interviews which both 
coders were able to rate. The percentages for 
the college graduate sample range from 56% 
to 99%, averaging 90% with 12 of the 18 
variables being rated for 90% or more of the 
interviews. The corresponding figures for the 
-noncollege group are markedly lower. They 
range from 33% to 99%, average 79%, and 
only 8 of the variables have percentages of 
90 or better. No explanation is at hand to 
explain the lesser ratability of the noncollege 
_interviews. It could be that interviewers 
and coders with extensive higher educa- 
tion are more effective in evaluating college- 
educated Ss. 

Several variables were nearly always 
ratable for both samples. These included be- 
havior flexibility, goal flexibility, need ad- 
vancement, Bell System value orientation, 
ability to delay gratification, and range of 


TABLE 1 


NUMBER AND PERCENTAGE OF INTERVIEWS CODED 





College Noncollege 
sample sample 
Variable CV, = 200) | (N= 448) 


Personal Impact—Force- 


fulness 157 79 49 33 
Oral Communication 
Skills 155 78 95 64 


Human Relations Skills | 185 93 56 38 
Personal Impact- 


Likability bil 56 S1 34 
Behavior Flexibility 198 99 143 Tf 
Need Approval— 

Superiors 165 83 116 78 


Need Approval—Peers 170 85 105 71 
Tolerance of Uncertainty} 191 96 125 84 
Inner Work Standards 197 99 135 91 


Primacy of Work 185 93 142 96 
Energy 187 94 114 ish 
Goal Flexibility 190 95 146 99 
Need Advancement 197 99 146 99 
Need Security 174 87 125 84 
Social Objectivity 184 92 122 82 
Bell System Value 

Orientation 197 99 142 96 
Ability to Delay 

Gratification 190 95 142 96 
Range of Interests 193 97 146 99 

Mean 179 90 117 79 





interests. Variables that proved clearly less 
ratable from the interview reports included 
personal impact—forcefulness, oral communica- 
tions skills, and personal impact-—likability. 

It may be surprising that such variables as 
personal impact and oral communications 
skills were not consistently ratable since they 
are qualities eminently observable in the 
interview situation. This paradox is resolved 
by remembering that these variables were 
freely observable throughout much of the 34 
days of assessment, and the interviewers felt 
no strong need to report on these character- 
istics. It must be emphasized that the research 
reported in this article is not based on an 
interview designed to cover all variables 
equally but an interview which was intended 
to supplement the rest of the assessment 
process. 

The estimated reliabilities of the ratings 
are presented in Table 2 (Ns for each vari- 
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able being the same as in Table 1). For the 
college graduate sample the range is from .73 
to .92, with a median of .82. The reliabilities 
for the noncollege sample tend to be lower, 
ranging from O to .92 with a median of .72. 
Whereas only 6 of the reliabilities are below 
.80 in the college graduate sample, 11 fall 
below .80 in the noncollege sample. The inter- 
view reports, it can be seen, yielded not only 
less information but also less reliable infor- 
mation for the noncollege as compared to the 
college sample. (Simple practice in coding is 
not the reason for these results; the college 
sample was coded first.) Comparison of 
Tables 1 and 2 indicates that the most com- 
plete and reliable information for both 
samples was obtained on the variables of need 
advancement, tolerance of uncertainty, social 
objectivity, and range of interests. 

To discover possible clues for the discrep- 
ancies between the reliabilities of several of 
the variables, comparisons were made between 
the variances of the variables (Table 3; Ms 
same as in Table 1). Those F ratios with 
larger variance in the numerator were com- 
puted. Statistically reliable differences in the 
variances of eight of the variables were ob- 


TABLE 2 


CopER RELIABILITIES 











College | Noncollege 
sample sample 
Variable 
Tii 
Personal Impact—Forcefulness 90, 92 
Oral Communication Skills 92 92 
Human Relations Skills 82 aou 
Personal Impact-Likability 85 89 
Behavior Flexibility vii .08 
Need Approval—Superiors 19 74 
Need Approval—Peers 82 .67 
Tolerance of Uncertainty .80 80 
Inner Works Standards 82 .67 
Primacy of Work 76 a? 
Energy 84 .70 
Goal Flexibility 84 oS 
Need Advancement 86 .90 
Need Security 90 .68 
Social Objectivity 90 .89 
Bell System Value Orientation 76 i) 
Ability to Delay Gratification mo 00 
Range of Interests 718 19 








TABLE 3 


VARIANCES AND F RATIOS 





College | Noncollege 


sample sample 
Variable F 
se 2 
Personal Impact— 

Forcefulness 3.99 4.75 1.19 
Oral Communication 

Skills 4.50 S57] 1.24 
Human Relations Skills Desill 3.65 1.58* 
Personal Impact-— 

Likability 2.02 2.87 1.42 
Behavior Flexibility 2.00 1.08 185s 
Need Approval- 

Superiors 2.10 1.97 1.07 
Need Approval—Peers Deal 2.04 1°33 
Tolerance of Uncertainty} 2.70 2.96 1.10 
Inner Work Standards DELS 1.76 1.28 
Primacy of Work 1.67 1.74 1.04 
Energy 1.73 2.38 1.38* 
Goal Flexibility 3.06 Peat less 
Need Advancement 2ei3 3.74 esi: 
Need Security 4.28 OeZe desc 
Social Objectivity 4.51 4.35 1.04 
Bell System Value 

Orientation 1.84 1.39 1232" 
Ability to Delay 

Gratification 1.94 1.46 1233% 
Range of Interests 1,93 1.74 1.11 





* 02 <p < .10 that variances are equal. 
** ) < .02 that variances are equal. 


tained. In five of these instances the vari- 
ance for the college sample was larger. In the 
instance of behavior flexibility, the variance 
for the college graduate sample is nearly 
double that for the noncollege sample. The 
relatively low variance for the latter prob- 
ably contributes to the low reliability of the 
ratings (ry = .08) and suggests that for the 
noncollege sample the coders had difficulty 
discriminating on this variable. For the re- 
maining variables, however, the possible influ- 
ence of differences in the variances on discrep- 
ancies in the reliabilities of the ratings would 
be difficult to assess. 

The intercorrelations between the interview 
variables, disregarding signs, range from 0 
to .71 (Table 4). Once again the coefficients 
for the college sample are generally higher 
than for noncollege men. The median inter- 
correlation for the former group is .24 as 
compared to .17 for the latter. The size of 
these correlations indicates that the interview 
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TABLE 5 


CORRELATIONS WITH STAFF JUDGMENTS 


























Factor 
Variable 
I II II IV | V | VI | VII | VIII | IX | x | XI 
College sample 
Personal Impact—Forcefulness 52 2m oe 42 44 29 22 36 | —38 | —16 ee 
Oral Communication Skills 46 46 | —23 33 36 23 40 26 | —26 | —06 32 
Human Relations Skills 29 27 | —20 16 28 23: 06 19 | —25 | —14 14 
Personal Impact—Likability 27 33) =00 DD 39 13 04 31 | —03 | —01 08 
Behavior Flexibility 23 22 | —18 12 21 18 09 10 | —16 | —12 17 
Need Approval—Superiors —17 | —09 27 | —11 | —05 | —14 | —08 03 20 27 | —24 
Need Approval—Peers —27 | —21 30 | —22 | —12 | —16 | —16 | —08 24 25 | —21 
Tolerance of Uncertainty 31 26 | —27 17 20 30 18 15 | —26 | —20 26 
Inner Work Standards 24 26 | —08 10 18 19 09 40 | —16 | —03 03 
Primacy of Work 25 25 | —16 13 16 19 05 35 | —22 | —10 03 
Energy 32 30 | —25 20 30 26 05 24 | —32 | —15 14 
Goal Flexibility —30 | —27 22 | —20 | —20 | —19 |} —13 | —31 24 13 | —02 
Need Advancement 41 30 | —43 27 23 23 23 27 | —57 | —24 33 | 
Need Security —40 | —30 44 | —17 | —23 | —23 | —29 | —19 50 21 | —47 . 
Social Objectivity 16 19 00 10 10 13 19 17 00 | —01 28 | 
Bell System Value Orientation | —11 | —07 13 | —06 |} —03 | —06 | —17 01 12 05 | —18 | 





Ability to Delay Gratification | —04 | —03 05 00 | —06 04 02 | —03 10 | —01 01 
Range of Interests 23 27 O1 14 24 13 35 15 | —05 Ts 17 





Noncollege sample 























{ 
{ 
Personal Impact—Forcefulness 48 24 25 25 03 00 | —58 | —25 | 
Oral Communication Skills 44 33 36 24 47 23 | —24 | —19 . 
Human Relations Skills 43 32 46 45 10 14 | —06 | —11 
Personal Impact—Likability 06 07 09 09 Sie 02 11 | —09 
Behavior Flexibility 06 —02 tS 00 06 | —O1 06 14 ] 
Need Approval-Superiors —24 —20 | —23 | —34 | —22 | —10 | —03 07 ¥ 
Need Approval—Peers = 23 —25 | —17 | —22 | —29 | —15 02 00 
Tolerance of Uncertainty 29 21 31 40 18 20 | —08 | —23 | 
Inner Work Standards 09 O1 07 01 | —03 43 | —05 17 
Primacy of Work 21 03 20 13 | —19 32 | —25 10 | 
Energy 32 05 31 21 05 35 | —15 00 
Goal Flexibility — 20 —16 | —14 | —16 10 | —17 41 17 
Need Advancement 40 29 i 19 | —14 31 | —67 | —06 
Need Security —25 —29 | —10 | —05 | —22 | —03 37 12 
Social Objectivity 19 18 19 15 19 13 02 28 
Bell System Value Orientation | —07 —16 02 03 | —13 06 20 18 
Ability to Delay Gratification 02 10 03 26 11 04 15 | —12 
Range of Interests 19 19 19 14 33 11 | —10 | —04 














variables were relatively independent. Never- standards, primacy of work and energy) con- 
theless, some clustering of the variables, par- cerns work motivation. Finally, the negative 
ticularly in the college graduate sample, seems _ correlation between need advancement and 
apparent. One cluster (personal impact— need security suggests career motivation. Of 
forcefulness, oral communications skills, and the remaining variables several (i.e., personal 
human relations skills) incorporates variables impact—likability, behavior flexibility, toler- 
reflecting interpersonal skills. Another (needs ance of uncertainty, and goal flexibility) have 
for approval-superiors and peers) reflects de- fairly substantial correlations with variables 
pendence on others. A third (inner work in one or more of the clusters, which in turn 
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end to overlap considerably with each other. and noncollege samples are listed below under 


‘our variables (social objectivity, Bell System 
alue orientation, ability to delay gratification, 
nd range of interests) have relatively low 
orrelations with all other variables. 

The main question to which this article 
s addressed is the role of the interview in 
he total assessment center process. The cor- 
elations between each of the interview vari- 
bles and variables reflecting assessment staff 
udgments made on the basis of all the assess- 
ment techniques are shown in Table 5 (Ns 
ame as in Table 1). 

Scores based on factorial analyses of 25 
haracteristics rated by the staff (Bray & 
srant, 1966) follow. The characteristics, 
elected for their relevance to the study, in- 
lude managerial skills, interpersonal relation- 
hips, general abilities, motives, values, and 


ittitudes. The factors obtained and _ their 
lesignations are 
Factor Identification 
I General Effectiveness 
II (college sample 
only) General Effectiveness 
III (college sample 
only ) Passive Dependency 
IV Administrative Skills 
Vv Interpersonal Skills 
VI Control of Feelings 
VII Intellectual Ability 
Vill Work-oriented Moti- 
vation 
IX Passivity 
x Dependency 
XI (college sample 
only) Nonconformity 


Table 5 shows that judgments of personal 
characteristics based on the interview reports 
lone and made independently of the assess- 
nent staff judgments, which were based on 
nformation from all of the assessment tech- 
1iques, correlate substantially with assessment 
staff judgments. Furthermore, the consisten- 
cies in magnitude and direction of the correla- 
ions from sample to sample are relatively 
1igh. 

To assist in interpreting these data, inter- 
view variables correlating .30 or higher with 
1 staff judgment factor for both the college 


each staff judgment factor: 


I. General Effectiveness 
Personal Impact—Forcefulness 
Oral Communication Skills 
Need for Advancement 
Energy 

IV. Administrative Skills 
Oral Communication Skills 

V. Interpersonal Skills 
Oral Communication Skills 
Energy 

VI. Control of Feelings 
Tolerance of Uncertainty 
Intellectual Ability 
Oral Communications Skills 
Range of Interests 
Work-oriented Motivation 
Personal Impact—Forcefulness 
Inner Work Standards 
Primacy of Work 
Passivity 
Need for Advancement (negative) 
Personal Impact—Forcefulness 

(negative) 
Need for Security 
X. Dependency 

(none) 


Tt 


VIII. 


IX. 


It is apparent from the above that some of 
the characteristics judged solely on the basis 
of interview reports do correlate substantially 
with staff judgments based on all the assess- 
ment techniques. It is also clear that some 
interview variables showed more pronounced 
relationships than others. Personal impact— 
forcefulness, oral communication skills, en- 
ergy, and need advancement, for example, 
were obviously potent interview variables. 

On the whole, the correlations for the 
selected variables are meaningful. One might 
expect, for example, that judgments of ad- 
ministrative skills, interpersonal skills, and 
intellectual ability would relate to ratings 
based on interview information for oral com- 
munications skills. Most of the remaining 
relationships make sense. The only correla- 
tions which are difficult to interpret are those 
between personal impact-—forcefulness and 
work-oriented motivation. 

Just how much influence the interview may 
have had on assessment staff judgments can- 
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not be ascertained from these data. Com- 
parison of the correlations in Table 5 with 
those for other assessment techniques (Grant, 
Katkovsky, & Bray, 1967, p. 231), however, 
shows that the interview appears to have 
played a primary role in the judgment of 
the factor of passivity. In addition, the cor- 
relations between some of the interview vari- 
ables and other factors are of the same magni- 
tude as those for other assessment techniques. 
Since most assessment staff members are con- 
vinced that the interview report does not in- 
fluence their ratings as much as behavior in 
the simulations or test results, these correla- 
tions are probably not produced by the influ- 
ence of the interviewer report. Instead, the 
interview gets successfully at some of the same 
dimensions, though not necessarily influencing 
judgments. 

A staff judgment on which the interview 
report clearly has a direct influence is that 
of passivity. This factor, which might be 
better labeled “career passivity,” involves the 
lack of a strong need to advance in the 
organization, a willingness to wait for ad- 
vancement, and an emphasis on job security. 
The correlation of the interview variable of 
need advancement with this factor for both 


TABLE 6 


CORRELATIONS WITH STAFF PREDICTIONS 











Races College | Noncollege 
sample sample 
Personnel Impact-— 

Forcefulness .49* Alt 
Oral Communication Skills .41* A8* 
Human Relations Skills iG 38 
Personal Impact- 

Likability 25 14 
Behavior Flexibility .19* a1 
Need Approval—Superiors —.02 — .20* 
Need Approval—Peers —.21* —.13 
Tolerance of Uncertainty on poOr 
Inner Work Standards pie 07 
Primacy of Work 20% ean 
Energy 25% ally) 
Goal Flexibility — .30* — .21* 
Need Advancement .28* 42* 
Need Security — ,28* —.17 
Social Objectivity 03 .18* 
Bell System Value Orientation —.13 —.05 
Ability to Delay Gratification 01 03 
Range of Interests vith On 


*p < .05 that p = .00. 
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the college and the noncollege groups (—.57 
and —.67) is higher than for any other 
assessment measure. 

A complete evaluation of the relative con- 
tributions of the interview to the assessment 
process awaits further studies. Regression 
analyses of all assessment techniques with 
scores based on the factorially derived charac- 
teristics are under way. In addition, analyses 
relating interviews and other assessment tech- 
nique variables to the 25 characteristics di- 
rectly rated by the assessment staff are 
planned. Though scores based on the factori- 
ally derived characteristics have proved useful 
in making studies of the interview and other 
assessment techniques, information pertinent 
to relatively specific characteristics (e.g., Bell 
System Value Orientation) has undoubtedly 
been omitted in the process. 

In addition to rating 25 qualities (those 
summarized in the factors of Table 5), the 
assessment staff made predictions of progress 
in the management hierarchy. The specific 
prediction was whether each S$ would reach 
middle management within 10 yr. from the 
time of assessment. These predictions have 
proved significantly accurate (Bray & Grant, 
1966). Table 6 (same Ns as in Table 1) 
shows the correlation of each interview vari- 
able with this staff prediction. 

An inspection of the table reveals that 22 
of the 36 correlations are statistically signifi- 
cant at the .05 level. Once again the present 
data cannot reveal the extent to which the 
staff was influenced by the interview report, 
but it is clear that the interview successfully 
captured characteristics relevant to the total 
assessment process. Significant for both the 
college and noncollege samples were the inter- 
view variables of oral communication skills, 
human relations skills, tolerance of uncer- 
tainty, goal flexibility, need advancement, and 
range of interests. 


Studies of the overlap of the interview with | 
other assessment techniques must also await 


a total analysis of the assessment process. It 
may be interesting, however, to note the rela- 


tionship in the college sample of one of the — 


more important interview variables, personal 
impact—forcefulness, to some of the other 
assessment measures. Forcefulness in the 
interview correlates .35 with contribution to 
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TABLE 7 


CORRELATIONS WITH SALARY PROGRESS 





College 


graduates Noncollege 


Interview variable 


Personal Impact- 

Forcefulness 71 .26* | 43 Le 
Oral Communication Skills} 69 yee 69 50* 
Human Relations Skills 76 20 48 41% 
Personal Impact- 


Likability 50 PP) 37 | —.11 
Behavior Flexibility 80 .30* | 119 04 
Need Approval-Superiors | 71 | —.36* | 102 | —.27* 
Need Approval—Peers 75 | —.36* | 96) —.17 


Tolerance of Uncertainty | 74 06 | 105 Zoe 
Inner Work Standards 80 07 | 114 .08 


Primacy of Work ie 30* | 118 os 
Energy 74 Son LOC, 16 
Goal Flexibility 74 | —.08 | 120} —.16 
Need Advancement 80 .49* | 120 .44* 
Need Security 75 | —.35* | 112 | —.26* 
Social Objectivity 75 ian .20* 
Bell System Value 

Orientation 80 | —.11 | 120 | —.10 
Ability to Delay 

Gratification 76 | —.16 | 116 aly) 


Range of Interests 79 .28* | 121 Lae 


*p <.05 that p = .00. 


the Manufacturing Problem, .49 with con- 
tribution to the Group Discussion Problem, 
35 with Projective Test achievement motiva- 
tion, .35 with Projective Test willingness to 
assume a leadership role, —.30 with Projec- 
tive Test dependence, .32 with need domi- 
mance on the Edwards Personal Preference 
Inventory, and .32 with ascendance on the 
Guilford-Martin Inventory. (These are the 
highest seven correlations of this interview 
variable with 41 measures from other tech- 
niques for the college sample.) 

A final question is the extent to which 
the interview report ratings are directly re- 
lated to progress in management. The data 
in Table 7 concern 81 college graduates 
having 8-10 yr. of experience in two telephone 
companies and on 122 noncollege men with 
8-9 yr. of experience in two such companies. 
The correlations shown are average correla- 
tions for the two company samples. The 
orogress criterion was obtained by computing 
the difference between each S’s salary at the 


time of assessment and his salary on June 30, 
1967. 

Eighteen of the 36 coefficients in this table 
are significant at the .05 level, 9 each for the 
college and noncollege groups. Interview vari- 
ables reliably predictive for both groups were 
need approval—superiors (negative), primacy 
of work, need advancement, need security 
(negative), and range of interests. 

These results are comparable to those re- 
Jating the interview report variables to 
assessment staff evaluations, i.e., the staff 
predictions (Table 6) and scores based on 
the general effectiveness factor (Table 5). 
Though the specific correlations vary, the 
patterns of correlations are roughly similar. 
The findings thus indicate that the assessment 
staff was interpreting organizational values 
correctly. 


DISCUSSION 


The results of this investigation of the 
assessment interview clearly indicate that the 
interview reports contributed to the assess- 
ment process. Judgments of career motivation 
apparently depended heavily on the interview. 
Work motivation and control of feelings 
ratings also appear to have been influenced by 
the interview information. In addition, judg- 
ments of interpersonal skills were at least 
reinforced, if not influenced, by the interview 
reports, 

That information from the interview was 
also predictive is demonstrated by the large 
number of statistically significant correlations 
with success in management. Variables re- 
flecting career motivation, dependency needs, 
work motivation, and interpersonal skills were 
related to individual differences in salary 
increases. 

As noted previously, this investigation does 
not in itself establish the relative weight of 
the interview as compared to the other tech- 
niques in the assessment center process. It is, 
furthermore, not an experimental test of how 
much the interview could accomplish. The 
interviewers made, for example, no efforts to 
uncover information on administrative skills 
since this was presumably adequately covered 
elsewhere. The results demonstrate, neverthe- 
less, that the interview did produce reliable 
ratings of managerial qualities which cor- 
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related significantly with ratings made on 
the basis of several other techniques and 
with advancement. 

The method used in quantifying the inter- 
view data is, of course, more characteristic of 
that used in analyzing interview information 
obtained from surveys than that used in 
studying assessment or selection interviews. 
The ratings of selected variables were made 
by independent coders rather than by the 
persons doing the interviews. Whether inter- 
viewer ratings would have produced similar 
results cannot be determined. The findings 
are pertinent to a suggestion by Ulrich and 
Trumbo (1965), however, that the selection 
interviewer should function as an informa- 
tion gatherer and reporter, leaving selection 
decisions up to others. 

With regard to various issues in interview- 
ing raised by reviewers of research on the 
topic (Mayfield, 1964; Ulrich & Trumbo, 
1965), the findings of this study tend to shed 
either light or confusion, depending on one’s 
point of view. In contrast to their findings, 
the interview reports from relatively unstruc- 
tured interviews yielded quite reliable and 
valid (i.e., predictive) information. Further- 
more, the information on many of the vari- 
ables was sufficiently complete for two coders 
to make ratings. Perhaps the issue is not that 
of structured vs. unstructured interviews but 
of interviewer skill and understanding of what 
is to be covered. 

Finally, with regard to the personal charac- 
teristics an interviewer can identify, the find- 
ings of this study tend to be more supportive 
of the views of Ulrich and Trumbo (1965) 
than of Mayfield (1964). The interviewers 
did identify career motivation and _ inter- 
personal skill characteristics. They apparently 
also obtained reliable information on addi- 
tional characteristics. They were not asked to 
obtain information on intellectual abilities 
per se, though some of the characteristics 
identified did correlate substantially with 


assessment staff judgments of intellectual 
abilities and with mental ability test scores. 

The findings of this study give positive 
support to the use of the interview in assess- 
ment center procedures. They also suggest 
several possible areas for research on the 
interview. Among such would be studies com- 
paring judgments of interviewers themselves 
with those made by others on the basis of the 
interview reports. Such studies would bear 
on the issue of whether interviewers should 
be primarily reporters or assessors. Additional 
studies on the personal characteristics which 
an interviewer can identify would also be 
useful. The findings of this study suggest 
that the interview may have considerable 
scope and still be reliable and valid. 
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In order to observe the extent to which a person’s behavior is variable on 1 set 
of tasks and on another set of similar tasks, and to observe the extent to which 
his behavior is variable on 1 set of tasks and on a set of dissimilar tasks, 100 Ss 
were given 6 tests, a different form of each test being administered on 20 
successive days. Variance indices, reflecting intraindividual variability over time, 
were derived from the even- and odd-numbered test forms, and the correlations 
between variability on the even- and odd-numbered forms ranged from .25 to 
.89. Variance indices on the 6 different tests correlated with one another from 
—,23 to .53. Temporal intraindividual variability on some tasks can be reliably 
and meaningfully observed. Such variability is not widely generalized over a 
large portion of the behavior domain and neither is it specific to each task or 
behavior. Variability on some tasks is related to variability on certain other 


tasks. 


The extent to which an individual’s be- 
havior is consistently variable on a given task 
or on different tasks may provide cues as to 
the effectiveness of his behavior. The vari- 
ability studied here corresponds to that de- 
scribed by Fiske and Maddi (1961): “one 
form of variability, the variation in the be- 
havior of a given organism at different times 
but under the same external conditions [p. 
327].” This form of variability is to be con- 
trasted with that discussed by Hull (1927) 
who was concerned with variability in the 
amount of different traits possessed by an in- 
dividual, and also to be contrasted with the 
variability discussed by Wechsler (1952) who 
was concerned with the variability of a given 
“rait within the population. 

- Theoretically, an individual cannot repro- 
Yuce a behavior identically, since once he 
‘tas performed a task, repetition of that task 
‘must be influenced by its prior performance. 
‘n spite of the impossibility of studying the 
sndividual’s variability while performing the 
same task, his variability can be studied while 
yerforming a homogeneous group of tasks. 


1 This project was supported with funds from the 
Jnited States Office of Education, Office of Educa- 
ion Grant 3-7-068694-2082. Appreciation is ex- 
yressed to Richard Arvey and Diane Johnson Tins- 
ey for their contribution in analyzing the data. 

2Requests for reprints should be sent to the 
thor, Student Life Studies, Office of the Dean of 
jtudents, University of Minnesota, 2001 Riverside 
kvenue, Minneapolis, Minnesota 55455. 


Tasks and situations change from performance 
to performance, but these can be ordered 
into highly similar categories and variability 
of behavior so studied. 

Fiske provided one theoretical basis for 
the analysis of variability. He regarded in- 
dividual variability as having a coping func- 
tion which affects the organism’s adaptability. 
When several alternate behaviors are available 
to the organism, its inherent variability in- 
creases the likelihood that the organism 
eventually will select and adopt the response 
which best copes with a given situation. Fiske 
and Maddi (1961) defined several questions 
concerning individual differences in variability. 

In an earlier study, Berdie (1961) found 
that variability on a task involving advanced 
high school mathematics appeared consistent 
within the individual and that this variability 
might be related to the extent to which a 
person’s college achievement could be pre- 
dicted on the basis of aptitude tests. The two 
questions approached in the present study 
were: (1) Can the variability over time of 
an individual’s behavior be reliably and con- 
sistently observed? (2) To what extent are 
persons variable over time on one task also 
variable on other tasks? 

If variability over time can be reliably ob- 
served and is not specific to each task, then 
the variability of an individual may be a 
useful concept in understanding his behavior. 
For example, variable persons may be more 
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or less predictable or more or less task 
oriented. 


METHOD 
Tasks 


The behavior observed consisted of responses to 
six of the Repetitive Psychometric Measures (scores 
on these tests) developed by Moran and Mefferd 
(1959). Each of the six tests represents a distinct 
factor derived from repeated factor-analytic studies. 
The Aiming test consists of 15 rows each containing 
20 circles, and the circles are in rows connected se- 
quenually by a line. The S places the test on a piece 
of corrugated paper and the task is to punch a hole 
inside as many circles as possible within 90 sec. 
without touching the circles. The Ss in this experi- 
ment used a stylus consisting of a pencil-sized piece 
of wood with a thin pinpoint at one end. The test 
involves the ability to carry out quickly and pre- 
cisely a series of movements depending on eye-hand 
coordination. 

The Flexibility of Closure test requires S to copy 
36 geometric figures into matrices of dots. Each test 
form contains 36 figures. The task, as described by 
the authors, is to retain the image of a specified con- 
figuration despite the influence of other distracting 
configurations in the perceptual field. The Number 
Facility test is similar to French’s N factor and con- 
sists of 90 problems each requiring the addition of 
three two-digit numbers. 

The Perceptual Speed test requires the identifica- 
tion of well-known symbols in a mass of material 
and consists of rows of 30 digits with an encircled 
digit at the left of each row. The task is to cross 
out every digit in the row similar to the encircled 
digit. The time limit specified by Moral and Mef- 
ferd is 2% min., but early experience with this test 
suggested that too many Ss completed the test within 
this time limit and the time limit was reduced to 144 
min. 

The Speed of Closure test measures the ability to 
unify an apparently disparate perceptual field into a 
single percept. Each form consists of 22 lines and 
each line has letters in it apparently arranged at ran- 
dom but containing from two to four 4-letter words 
which are to be encircled. The final test, Visualiza- 
tion, consists of tangled lines which must be followed 
visually from their start to finish. 

For each of these tests Moral and Mefferd devel- 
oped 20 different forms with the original intent that 
the forms would be equivalent. Later study, how- 
ever, indicated that on each of the tests but Number 
Facility the alternate forms were reliably different 
(Moran, Kimble, & Mefferd, 1964) and correction 
factors were provided for the 20 alternate forms of 
these five tests. These correction factors were not 
used in this study, in light of the experimental design. 

The test authors, comparing scores on Form 1 and 
Form 2, reported test-retest reliabilities ranging from 
.72 to .94. Intercorrelations of the six tests, using 
only Form 1, ranged from .09 to .44. Considering the 
purpose for which these tests are to be used here, 
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they appeared to be adequately reliable and suffi- 
ciently independent from one another. 


Sample 


The population from which the Ss were drawn 
consisted of freshmen entering the University of 
Minnesota Institute of Technology in the fall of 
1966, All freshmen were informed of the possibility 
of participating in the experiment and, from those 
who volunteered, Ss were selected on the basis of 
class schedules, availability of data, and proximity 
to campus. The Ss consisted of fairly representative 
bright college students who had survived at least one 
demanding academic quarter and who were moti- 
vated to earn $40 by participating in an experiment 
that would cause them no stress or discomfort. 


Procedures 


The experiment was conducted in a well-isolated 
subbasement room with overhead lights and lamps 
arranged so that illumination was not brilliant but 
Ss could see comfortably. No noise from outside the 
building penetrated the room and little traffic passed 
in the corridor outside of the door. Temperature 
in the room was constant and comfortable, although 
when the door was closed there was little ventilation. 
Insofar as Ss remained in the room for periods of 
only 20 min. and the door was kept open for at 
least 4% hr. between sessions, lack of ventilation pro- 
duced no discomfort. 

The Ss were seated in the center of the room in 


classroom chairs with arm tablets. They were divided ~ 


into five groups and a group was tested each day at 
9:30 AM., 12:30 noon, 1:30, 2:30, and 3:30 PM. 
Assignments to time periods were based on the class 
schedules submitted by Ss. 


Approximately one-fifth of the Ss took Form 1 of © 


the test on the first day, Form 2 on the second, etc. 
Another group of Ss took Form 5 on the first day, 
Form 6 on the second day, and on the twentieth day 
took Form 4. Other groups of Ss started on Form 
9, Form 13, and Form 17, in order to provide some 
randomization of form-sequence influence. Within 
each time session, students were randomly assigned 
to sequence groups. 

At the first session, the experimenter read to each 
group an introductory statement, and a trained 
and experienced psychometrist then read the test 
instructions, administered the practice exercises pro- 
vided by Moran and Mefferd, and administered the 
tests, 

Testing schedules for each group were arranged 
Monday through Friday for 4 successive wk., and Ss 
who missed sessions made them up during an adja- 
cent session or during the fifth week. Of the 100 
Ss, 62 attended daily. Over 95% of the tests were 
administered to Ss at the time of day originally 
scheduled. 

At the completion of the last form of the last test 
each S completed a questionnaire reporting his 
reaction to the tasks and his perceptions of the pur- 
pose of the experiment. The Ss were told at the first 
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session that the purpose of the experiment was to 
compare the psychological characteristics, as mea- 
sured by these tests, of students in technology and 
science to those of other students. 


Analysis 


The 12,000 test papers were scored by research 
assistants and, when scoring was completed, 200 
papers for each test representing all of the 20 forms 
were drawn and rescored. The original scores were 
compared to those obtained by rescoring, and fre- 
quencies of errors of various sizes were tabulated 
and correlations determined. Scoring was judged to 
be adequate for five of the six tests but all 2,000 
of the Number Facility tests were rescored. The test 
scores were then entered on basic record cards for 
each student, verified, and then punched and verified 
on IBM cards. 

For each of the six tests a variability index was 
computed for each student. This consisted of the 
variance (SD*) of the 20 raw scores derived from 
the 20 forms of the tests. At the same time, for each 
of the six tests a mean score was computed for each 
student, this consisting of the mean of the 20 scores 
derived from the 20 forms. 

Then, in order to facilitate comparisons between 
tests and to provide a basis for obtaining a total 
variance index, each raw score was transformed to a 
standard score, using a mean equivalent to 50 and a 
standard deviation equivalent to 10, based on the 
distribution of 100 scores of each form of each test. 
For example, the 100 scores on Form 1 of the Aiming 
test were selected, the mean and standard deviation 
calculated, and, for each student, his raw score on 
Form 1 was transformed to a standard score based 
on this distribution. Then, for each of the six tests 
a variability index for each student was computed, 
along with a mean index, based on the 20 standard 
scores. A seventh variance and mean index were 
calculated for each student, based on all 120 of 
the scores. 

Thus, for each student six variance indices and six 
mean indices based on raw scores were available, 
and seven variance indices and seven mean indices 
were available based on standard scores. 

The consistency of the variability index was re- 
_ vealed by what corresponds to an odd/even reliabil- 
_ ity coefficient. The scores on each of the 10 odd- 
numbered forms were used to provide a variance and 
‘a mean, and the scores on each of the 10 even- 
numbered forms provided comparable indices. On 
each test, each S had two variance indices and 
two mean indices, and in each instance the correla- 
tion was calculated between the even-numbered-form 
and the odd-numbered-form indices. The analysis 
was done first using raw scores and then standard 
scores. 

The correlation coefficients then were calculated 
between the variance indices on the six tests, first 
using the indices based on raw scores and then the 
indices based on standard scores. The mean indices 
were analyzed similarly. 


RESULTS 
Reliability of Indices 


Table 1 shows the correlations between 
variance and mean indices based on odd- 
numbered and even-numbered forms, using 
raw scores. Table 2 presents similar informa- 
tion based on standard scores and includes 
information on the total score, which consists 
of the sum of the standard scores for the six 
tests. The test scores are reliable, as shown 
by the mean score correlations which range 
from .96 to .99. 

Two of the variance indices, one based 
on Aiming and the other on Number Facility, 
show relatively high consistency; two, Per- 
ceptual Speed and Speed of Closure, provide 
correlation coefficients in the mid-50s. Visual- 
ization provides the lowest reliability coeffi- 
cient, .25. The total variance index provides 
a correlation of .89, suggesting that whatever 
this is it is an index that can be obtained 
rather consistently. 

These estimates of reliability are based on 
10 scores. When one uses the Spearman-Brown 
prophecy formula, reliability estimates of the 
variability index based on 20 scores are: 
Aiming, .91; Flexibility of Closure, .58; Num- 
ber Facility, .89; Perceptual Speed, .71; 
Speed of Closure, .73; and Visualization, .40. 
One can conclude that, on Aiming and Num- 
ber Facility, variability within persons tends 
to be remarkably consistent and consistency 
of variability is found on all other tests. 

Berdie (1961), in a previous study, ob- 
served the reliability of a similar variance 
measure based on 10 subscores of a mathe- 
matics achievement test. The reliability coeffi- 
cients for various groups in that study ranged 
about .90. Using a somewhat similar method 
of analyses of varied repeated personality 
assessment data, Fiske (1957) reported odd/ 
even reliabilities extending from .46 to .96. 
His results also suggested that the extent of 
consistency of a person’s variability depended 
in part on the task or instrument used. 


Relationships between Variance Indices 


Table 3 shows the intercorrelations between 
the six variance indices based on raw scores 
and Table 4 similar intercorrelations of the 
seven indices, including the total variance in- 
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TABLE 1 


CoRRELATIONS FOR 100 UNIVERSITY OF MINNESOTA INSTITUTE OF TECHNOLOGY FRESHMEN ON VARIANCE INDEX 
AND MEAN RaAw ScoRE BASED ON 10 Opp-NUMBERED ForMs AND 10 EvEN-NUMBERED FORMS 
OF Eacu oF SIx REPETITIVE PSYCHOMETRIC MEASURES TESTS 





Variance index 


Mean raw score 





Test Odd-numbered 
r 
M SD M 
Aiming .83 | 210.74 | 182.04 | 199.94 
Flexibility of Closure} .41 20.16 11.79 1/259 
Number Facility .80 29.54 29.45 24.44 
Perceptual Speed Bos) 33.28 20.63 44.12 
Speed of Closure EO 69.36 | 33.27 52.85 
Visualization 225) 47.67 25.27 38.31 


Even-numbered 


Odd-numbered | Even-numbered 


SD M SD M SD 
186.46 | .96 | 111.40 | 15.64 | 114.34 | 16.13 
LOLS 797 18.29 4.77 20.10 5.16 
26.11 | .99 43.04 | 10.34 44.46 | 10.59 
30.67 | .97 61.64 5.53 63.84 6.18 
28.14 | .98 38.85 6.43 39.31 6.38 
23.59 | .98 53.84 7.80 54.28 8.02 





Note.—All correlations significant beyond .01 level except one significant between .01 and .05. 


dex, based on standard scores. In Table 3, of 
the 15 correlations, 4 were significant beyond 
the .01 level of probability, 1 between the .05 
and .01 level. The variances for Aiming and 
Number Facility correlated .47, between Aim- 
ing and Speed of Closure, .28, between Speed 
of Closure and Number Facility, .22, and 
between Number Facility and Visualization, 
.27. The highest intercorrelation was found 
between the two variance indices having the 
highest reliability, and the intercorrelations 
must be examined in light of the reliabilities 
of the variance indices. 

Using the uncorrected reliabilities based on 
the odd- and even-numbered forms and cor- 


recting the intertest variance correlations for 
attenuation (unreliability), the correlation 
between the variances for Aiming and Num- 
ber Facility increases from .47 to .58, be- 
tween Aiming and Speed of Closure from .28 
to .41, between Number Facility and Speed 
of Closure from .22 to .33, between Number 
Facility and Visualization from .27 to .60. 
These correlations suggest that some of the 
observed independence between the variances 
is due to the unreliabilities of the variance 
indices. 

The reliability coefficients themselves 
are minimum estimates and one can cor- 
rect them as we have done before by ap- 


TABLE 2 


CORRELATIONS FOR 100 UNIVERSITY OF MINNESOTA INSTITUTE OF TECHNOLOGY FRESHMEN OF VARIANCE INDEX 
AND M&AN T ScorE BASED ON 10 Opp-NUMBERED AND 10 EveEN-NUMBERED FORMS FOR 
Stx REPETITIVE PsyCHOMETRIC MEASURES TESTS AND FOR ““ToTAL MEASURE” 











Variance index Mean T score 
Test Odd-numbered Even-numbered Odd-numbered Even-numbered 
r r 

M SD M SD M SD M SD 

Aiming 82 47.60 40.04 44.65 41.12 .96 49.99 a3 50.00 loll 
Flexibility of Closure} .42 47.03 27.46 37.28 19.04 97 50.00 7.56 50.00 8.13 
Number Facility 80 20.68 21.59 17.05 19.19 99 50.00 9.01 50.00 9.19 
Perceptual Speed 53 42.84 33.91 37.42 32.75 97 50.00 7.81 50.00 8.12 
Speed of Closure Bal 61.42 30.35 S203 27.04 .98 50.00 6.65 50.00 7.24 
Visualization ih 39.26 22.26 34.38 21.20 98 50.00 8.02 50.00 8.29 
Total Measure 89 66.91 26.56 63.15 Diao .99 50.00 5.79 50.00 6.10 











Note.—aAll correlations significant beyond .01 level. 
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[NTERCORRELATIONS BETWEEN THE SIX “VARIANCE” InpIcEs (Ustnc Raw Scores) ror 100 UNIvEeRsiITy OF 
MINNESOTA INSTITUTE OF TECHNOLOGY FRESHMEN Eacu TAKING 20 Forms oF EACH 








Test FC NF PS 
Aiming (A) —.02 BV fac 01 
Flexibility of Closure (FC) mis —.15 
Number Facility (NF) —.18 


Perceptual Speed (PS) 
Speed of Closure (SC) 
Visualization (V) 


sc 


Paes 
.16 
oaae 
14 


OF Six REPETITIVE PSYCHOMETRIC MEASURES TESTS 


|v 


.O1 
vAee 
pode 
—.01 
.06 


M 


202.04 


19.09 
26.58 
38.11 
58.45 
41.59 


167.48 
9.45 
20,11 
21.34 
25.88 
18.23 


*b <.05. 
ED < .01. 


slying the Spearman-Brown prophecy for- 
nula. Unlimited corrections of this sort to 
statistical data lead to a morass of difficulty, 
yarticularly when one is concerned with pre- 
liction, but in this instance the concern is 
with arriving at some estimate as to relation- 
ships between variances, and these should be 
pased on the best reliability estimates. Table 
> shows the intercorrelations of variance in- 
lices when the correlations are corrected for 
ittenuation and the reliability coefficients 
ised have been corrected with the Spearman- 
3rown prophecy formula. This table gives an 
yptimal estimate of the relationships. 
Recognizing the questionable assumptions 
hat have to be made with these two correc- 
ions entering into the coefficients, the table 
eveals that the variance on each of the tests 
5 to some extent related to the variance on 
me or more of the other tests. The variance 
adex on Number Facility is significantly cor- 


related with the index of each of the other 
five tests. The variances on Flexibility of 
Closure and Speed of Closure are related to 
variances on four of the other five tests. Per- 
ceptual Speed variance is correlated to three 
of the other indices, and two of the coeffi- 
cients are negative, and Aiming and Visualiza- 
tion variances each are correlated with two 
of the other variance indices. The Aiming and 
Number Facility indices have the highest reli- 
abilities, are the most highly intercorrelated, 
and the Number Facility index is signifi- 
cantly related with each other index, although 
the correlations are small. The best indication 
of the variance domain may be provided by 
the Number Facility and Aiming tests. 


DISCUSSION 


These results suggest that intraindividual 
variability is not specific to each task and 
neither does a strongly generalized character- 


TABLE 4 


NTERCORRELATIONS BETWEEN THE SEVEN “VARIANCE” InNpIcES (Ustnc T Scores) For 100 UNIVERSITY OF 
MINNESOTA INSTITUTE OF TECHNOLOGY FRESHMEN Eacu TAKING 20 Forms oF EAcH OF SIx REPETITIVE 
PsycHoMETRIC MEASURES Tests (Atso INCLUDED IS THE “TOTAL VARIANCE”’ INDEX) 























Test FC | NF SC | V aN M SD 

‘iming (A) .00 “aot — .06 -20"* .05 42" 44.89 36.62 
‘Jexibility of Closure (FC) 14 —.10 he 27** 15 40.84 19.25 
‘umber Facility (NF) —.18 .20* .32** .50** 18.23 18.38 

erceptual Speed (PS) —.01 .03 —.03 39.15 28.02 

peed of Closure (SC) .03 .21* 54.60 25.26 

isualization (V) .28** | 35.76 | 16.97 

otal Variance (TV) 64.69 25.97 
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TABLE 5 


INTERCORRELATIONS BETWEEN THE SIX VARIANCE 
Inpices Usinc Raw Scores: CORRELATIONS 
CORRECTED FOR ATTENUATION (UNRELI- 
ABILITY OF VARIANCE INDICES) USING 
SPEARMAN-BROWN CORRECTED 
RELIABILITY COEFFICIENTS 




















Test ine; | anne) 1s SC | Vv 
Aiming (A) —.03 .53 0135 .02 
Flexibility of 

Closure (FC) gs =aRY PS) 67 
Number Facility 

(NF) —.23  .28 .46 
Perceptual Speed 

(PS) 20 —.02 
Speed of Closure 

(SC) 11 





Note.—N = 100. 


istic of variability extend over a_ broad 
variety of tasks. Rather, the conclusion is 
that the variability of a person on one task 
is somewhat related to his variability on cer- 
tain other tasks and if one is to speak of such 
variability for a person, one must specify the 
tasks on which statements are based. If more 
reliable means can be developed for observing 
intraindividual variability, better defined 
clusters of tasks may appear, but at present 
from among the tasks observed here the tasks 
measured by the Aiming and Number Facility 
tests provide the best indicators of variability. 

The interpretation of these findings de- 
pends on other observations and analyses. A 
series of analyses of variance revealed that 
the test forms are not equivalent and also 
that a significant practice or learning effect 
was present, insofar as on all six of the tests 
daily mean scores for the group tended to 
increase from the beginning to the end of 
the experiment. On five of the tests there was 
no evidence that the time of day of testing 
was related to mean scores on the tests, but 
on the sixth test there was some suggestion 
that this relationship might exist. 

Fiske raised the question regarding the 
relationship between the variability index 
and the value of the mean. The correlations 
between the variance index and mean index 
for each of the tests here were: Aiming, .41; 
Flexibility of Closure, .55; Number Facility, 
.30; Perceptual Speed, -—.26; Speed of 
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Closure, .57; Visualization, .03. Five of the 
coefficients are sigificant; three are positive 
and moderately high; one is negative. Exami- 
nation of two of the bivariate distributions 
provided no evidence that the variance indices 
were restricted at the low and high ends of 
the distributions of mean indices and the re- 
lationships appeared rectilinear. The fact that 
persons with high scores tended to be more 
variable provides some support for the hy- 
pothesis that variability extends the oppor- 
tunity for the development of adaptive be- 
havior. 

The questionnaires completed by Ss at the 
end of the experiment suggested that they 
were well motivated throughout the experi- 
ment, and 83% reported that they consist- 
ently put forth all of their effort in doing as 
well as they could. Ninety-two percent re- 
ported that they were able to work on these 
tests much more effectively on some days 
than they could on others. Eighty-eight per- 
cent reported that on the whole they enjoyed 
taking the tests. The test they enjoyed least 
was the Aiming test and the test they enjoyed 
next least was the Number Facility test, the 
two that provided the best variance indices. 

Immediately after the last form of the last 
test was administered, the students responded 
to an open-ended question asking what they 
thought the real purpose of the experiment 
was. Thirty-three percent provided the ex- 
planation given at the beginning of the 
experiment—to compare technology students 
to other students. Eighteen percent of the 
students reported that their perception of the 
purpose of the experiment was related to the 
consistency of behavior. Other reported pur- 
poses related to describing technology stu- 
dents, learning and improvement, motivation, 
and eye movements. Only four students re- 
ported that they did not know what the 
purpose of the experiment was. At the end of 
the questionnaire, students were presented 
with a checklist of five items pertaining to the 
purpose of the experiment. In responding to 
this list, 57% of the Ss checked the item, 
“The experiment was concerned with the con- 
sistency of my test behavior,” 33% responded 
that the purpose was, “To determine how well 
I did on these tests in relation to my fellow 
students also taking the tests.” These figures 


————— 
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suggest that a reasonably large proportion of 
the Ss had some realization that the experi- 
ment was concerned with the consistency of 
behavior but there was nothing to indicate 
that Ss for the most part were strongly moti- 
vated to behave consistently. 

A supplementary analysis suggested that 
the indices of intraindividual variability used 
here are related to the predictability of stu- 
dents’ academic behavior. For example, 
groups divided on the basis of the total vari- 
ance index derived from the 20 forms of all of 
the six tests (120 scores), into high and low 
variability groups, did differ in predictability. 
The average error of grade-point prediction 
for the high variability group was .28, for the 
low variability group .009, a difference sta- 
tistically significant between the .01 and .05 
levels. 


CONCLUSIONS 


Large and significant differences are found 
among individuals in variability of behavior 
over time. For example, one S had a mean 
score of 43 for the 20 forms of the Number 
Facility test, with a standard deviation of 
1.75, and another S with the same mean score 
had a standard deviation of 4.07. On each test 
large individual differences are found in vari- 
ability over time. 

The reliability with which these differences 
can be observed varies from task to task and 
the two tasks providing the most consistent 
variance index were the Aiming and Number 
Facility tasks. These were the two tasks that 
_ placed the students under the most stress in- 
sofar as they were the least preferred by 
the Ss. 

Variability over time on some tasks is re- 
_ lated to variability on other tasks, but these 
_ relationships are no more than moderate, 
even taking into account the relative inade- 
» quacies of the means of observation, and the 
highest correlation between any two of the 
variance indices was only about .50. 

If an easily observable variability charac- 
teristic had been identified extending over 


the six tests, one would face a difficult prob- 
lem related to the highly speeded nature of 
the six tests. One then would have to deter- 
mine the extent to which such a generalized 
intraindividual variability was related to 
variability in speed performance, rather than 
to variability over different tasks. The rela- 
tively small relationships among variances ob- 
served here might well be due to the common 
element of speed characterizing all of the 
tasks and one might be justified here in con- 
cluding only that to some extent the speed 
with which persons perform tasks shows some 
consistent intraindividual variability, quite 
apart from the task involved. However, all 
of the tasks were speeded and, if this intra- 
individual variability were primarily a func- 
tion of variations in speed, one would expect 
greater consistency among variance within 
tasks. 

The obtained results suggest that at least 
two of the tasks studied, Aiming and Number 
Facility, can provide adequate measures of 
intraindividual variability. The next question 
asked earlier by Fiske is, “With what are 
these individual differences in variability as- 
sociated?” 
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COMPARATIVE RELIABILITY OF PICTURE FORM 
AND VERBAL FORM INTEREST INVENTORIES * 
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Hypothesis tested was that use of picture items in occupational interest inven- 
tories results in higher reliability than is obtained using verbal items. Verbal 
forms were developed to parallel published picture interest inventories. These 
verbal forms and the published picture forms were administered to randomly 
divided subgroups of high school boys and Manpower Development and 
Training Act men, following a test-retest design. Pearson product-moment 
correlations were determined for the scale scores of each group of Ss, and 
significance of the difference between correlations obtained by picture form 
and verbal form groups was then tested. A possible tendency for picture forms 
to yield higher reliabilities was considered not strong or consistent enough to 
support general claims for picture item superiority. 


Authors and publishers of occupational 
interest inventories which use pictorial items 
have claimed advantages for the picture type 
item. Some of these expected advantages are 
reflected in this statement by Geist (1959): 


It appears that, in principle, it is possible to con- 
struct picture items which are much less ambiguous 
with respect to real-life referents than are most 
verbal items. This decreased ambiguity should lead 
to higher reliability and should make higher validity 
possible [p. 414]. 


More recently and with specific reference to 
his own instrument, the Geist Picture In- 
terest Inventory (Geist Inventory), Geist 
(1964) wrote that: “Drawings increased the 
reliability of the Geist Inventory and pro- 
vided less ambiguous stimuli than are of- 
fered by verbal interest tests [p. 17].” This 
higher reliability, however, was not demon- 
strated. 

The purpose of this study was to test the 
assumption that the use of picture items in 
interest inventories will result in higher reli- 
ability than would be obtained using verbal 
items. This problem is of particular impor- 
tance since the most widely used interest 
inventories rely exclusively on verbal items. 


1 The data for this study were collected as part of 
a doctoral dissertation at the University of Missouri, 
John L. Ferguson, major advisor. 

2 Requests for reprints should be sent to the 
author, University of Omaha, 60th and Dodge, 
Omaha, Nebraska 68101. 


PROCEDURE 


An approach to testing hypotheses concerning the 
superiority of picture items has been suggested by 
Hahn (1965). In his review of the Geist Inventory, 
Hahn stated that: 


No proof is offered that the drawings represent 
“stimuli which are closer to those he experiences 
in real life.’ A simple test of this assumption 
would have been to offer a verbal form of the 
test with the items being the names of the occu- 
pations, or activities, represented by the drawings 
Ppetz eile 


In this study verbal forms were developed to 
parallel two published picture interest inventories. 
Verbal forms and picture forms were presented to 
Ss following a test-retest design, and the results 
were analyzed with reference to the relative reliabil- 
ity of verbal and picture forms. 


Instruments 


The two interest inventories used were the Geist 
Picture Interest Inventory and the California Pic- 
ture Interest Inventory (California Inventory). 
The Geist Inventory has 113 drawings of vocational 
and avocational activities and 19 drawings of ob- 
jects associated with activities. These 132 pictures 
are arranged in 44 triads with separate brief in- 
structions for each triad. Examinees select one pic- 
ture in each triad according to the instructions for 
that triad. Responses to illustrations yield scores for 
11 interest areas or scales. These scales are: 1. Per- 
suasive, 2. Clerical, 3. Mechanical, 4. Musical, 5. 
Scientific, 6..Outdoor, 7. Literary, 8. Computational, 
9. Artistic, 10. Social Service, 11. Dramatic. The 
California Inventory consists of 159 pictures showing 
men engaged in various work activities. The inven- 
tory is divided into two parts. Part I presents the 
picture stimuli in 53 triads to which the examinee 
responds in forced-choice fashion indicating both 
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TABLE 1 


COMPARISON OF TEST-RETEST CORRELATIONS FOR VERBAL AND PICTURE FoRMS 
OF SCALES FROM Two INTEREST INVENTORIES 








Inventory 
1 2 3 4 
Geist*® 
Eleventh-grade boys 
Verbal (n = 61) 62 | .49 WW, .80 
Picture (n = 56) A OETA Sean S 
MDTA? men 
Verbal (n = 38) 45 ee |).30) el iD 
Picture (x = 37) 63 | .44 19 72 
California® 
Eleventh-grade boys 
Verbal (n = 58) 0k |s60 64 87 
Picture (n = 51) .86* | .92 .84* | .94* 
MDTA men 
Verbal (x = 36) nf) | eh 64 74 
Picture (n = 35) £05) 71790 vi 84 





Scale 
Mdn 
5 6 7 8 9 10 11 
.80 OS tan OZ 5 a | ees.) 44 58 68 
76 SL mene COnen | Wiad 44 SY) 78 
.68 nd 53 63 65 .42 56 63 
45 Smale OS Oia. 03 She Wl EY 65 
tS 80) le O 84 | .66 Wd 
.87* | 85 | .86* ESSuelita/ 3 86 
713 Nam |eo9 PCL eo Rie 
63 .40 | .64 Ome |meOor 65 











a1, Persuasive, 2. Clerical, 3. Mechanical, 4. Musical, 5. Scientific, 6. Outdoor, 7, Literary, 8. Computational, 9. Artistic, 


10. Social Service, 11. Dramatic. ; 
» Manpower Development and Training Act. 


° California Inventory has only nine scales: 1. Interpersonal Service, 2. Natural (Outdoor), 3. Mechanical, 4. Business, 5, 
Esthetic, 6. Scientific, 7. Verbal, 8. Computational, 9. Time Perspective. 


*p <.05, one-tailed, 
kb < 01, one-tailed. 


the most and least liked picture. In Part II, 30 of 
the pictures are repeated, this time being presented 
individually, with the examinee indicating either 
“like” or “dislike” for each picture. Responses to 
the illustrations yield scores for 9 scales. These are: 
1. Interpersonal Service, 2. Natural (Outdoor), 3. 
Mechanical, 4, Business, 5. Esthetic, 6. Scientific, 7. 
Verbal, 8. Computational, 9. Time Perspective. 
Brief verbal descriptions of the respective picture 
items are presented in both the Geist and the Cali- 
fornia Inventory manuals. These verbal descriptions 
served as a base for developing verbal items. When 
the verbal descriptions had been changed so as to 
make them appear suitable as inventory items, they 
were submitted to each of three judges. These judges 
were asked to accept or reject the verbal items on 
the basis that they were consistent with the author’s 
apparent intent as evidenced by the picture content 
and the author’s own verbal description of the item. 
All items were reworked until they were made 
acceptable to all judges. As a further check on the 
verbal items, matching tests were prepared with the 
picture items to be matched with their verbal 
counterparts. Each of these matching tests was given 
to 10 adults who had not had previous experience 
with the Geist or California Inventories. The average 
percentage of correct matching on the Geist Inven- 
tory was 99% and on the California Inventory, 98%. 


Subjects and Collection of Data 


The Ss for the study were eleventh-grade boys 
rom a central Missouri high school and male stu- 
lents from a Manpower Development and Training 


Act (MDTA) Basic Education Program in southern 
Missouri. 

The eleventh-grade boys were randomly divided 
and assigned to one of two groups. The first group 
was tested and retested after a 3-wk. interval with 
the picture forms. This group was designated the 
Picture Form group. Each S$ in this group took both 
the Geist Inventory picture form and the California 
Inventory picture form on both testing and retest- 
ing. The second group was tested and retested at 
the same time as the Picture Form group. This 
second group was given the verbal forms and was 
designated the Verbal Form group. Each S in the 
Verbal Form group took both the Geist Inventory 
verbal form and the California Inventory verbal 
form on both the testing and retesting. Difference in 
the size of N reported on Table 1 results from un- 
usable answer sheets and absences on retesting. The 
procedure described for eleventh-grade boys was 
replicated with the MDTA men with the exception 
that the time interval between testing and retesting 
was 4 rather than 3 wk. and that verbal form items 
were administered orally to the MDTA men be- 
cause of their known reading deficiencies. 


Analysis of the Data 


The Pearson product-moment correlations were 
determined for each group of Ss. Correlations were 
calculated using the raw scores for each of the 
different scales included in the inventories. The 
significance of the difference between the correla- 
tions obtained by the Picture Form and Verbal 
Form groups was then tested using Fisher’s Z’ trans- 
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formation (Johnson & Jackson, 1959). This pro- 
cedure was followed for both the Geist and the 
California Inventories and for both the eleventh- 
grade boys and the MDTA men. 


RESULTS 


Table 1 presents the findings for both the 
Geist and California Inventories and for both 
the eleventh-grade boys and MDTA men. 

On the Geist Inventory, considering both 
eleventh-grade boys and MDTA men, the 
Picture Form groups had significantly higher 
test-retest correlations than the Verbal Form 
groups on 4 of the 22 comparisons made. On 
7 of the comparisons made, the Verbal Form 
groups had the higher correlations, but on 
none of these did the difference reach signifi- 
cance. 

On the California Inventory, considering 
both eleventh-grade boys and MDTA men, 
the Picture Form groups had significantly 
higher correlations on 6 of the 18 compari- 
sons made. On 4 of the 18 comparisons made, 
the Verbal Form groups had the higher cor- 
relations, and 1 of these, Scale 6 for MDTA 
men, was significantly higher. 

An examination of Table 1 reveals an ad- 
ditional finding which may or may not be 
directly relevant to this study but is never- 
theless worth noting. On both the Geist and 
California Inventories there was a definite 
tendency for the eleventh-grade boys to ob- 
tain higher correlations than those obtained 
by the MDTA men. On 35 of the 40 com- 


parisons that can be made between the elev- 
enth-grade boys and MDTA men, the boys 
obtained the higher correlations. 


CONCLUSION 


Considering the results for both MDTA 
men and eleventh-grade boys on both the 
Geist and California Inventories, in all but 
one instance where a significant difference in 
reliability was found this difference favored 
the picture form. However, for the majority 
of scales the reliability of the picture forms 
was not significantly higher than for the 
verbal forms. Although there was an apparent 
tendency for the picture form scales to yield 
higher reliabilities, this tendency was neither 
strong nor consistent, and general claims for 
the superior reliability of picture items in 
interest inventories do not seem justified 
without further evidence. 
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COMPLEX VIGILANCE: 
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Human monitors were required to detect additions and deletions of experi- 
mentally defined relevant signals which were presented via a computer on an 
8 X 8 matrix display. The ratio of relevant to irrelevant stimuli on the display 
(20:10, 15:15, 10:20) and ratio of relevant to irrelevant signals or changes 
(40:20, 20:40 per 100-min. period) were investigated. Vigilance decrements 
were found in the detection of omit signals with the greatest decrements 
occurring in the experimental condition where the proportion of relevant to 


irrelevant signal changes was smallest. 


Recent advances in sensor and computer 
technology require the human observer to 
monitor highly complex and_ periodically 
changing displays. These advances have 
changed the observing task that originally 
spurred Mackworth (1950) to use the Clock 
Test as a laboratory task to investigate moni- 
toring performance. In referring to the 
changes in task characteristics, Kibler (1965) 
has suggested that the data collected in clas- 
sical vigilance studies may not be applicable 
to contemporary monitoring problems because 
of changes in the signal characteristics and in 
the human’s response requirements. Specifi- 
cally, he noted that weak, brief duration sig- 
nals are rarely encountered and that the hu- 
man is required to monitor multiple informa- 
tion sources. 

While the typical performance of observers 
in a simple display situation has been estab- 
lished as a monotonic decline in detected sig- 
nals with increased duration of watch, there 
have been few definitive statements about the 
conditions which lead to performance decre- 


1 This study was carried out in the Human Per- 
formance Center at the Ohio State University and 
was supported by the Air Force Systems Command, 
Research and Technology Division, Rome Air Devel- 
opment Center, Griffith Air Force Base, New York 
13442, under Contract No. AF 30(602)-3622 with 
the Ohio State University Research Foundation. 

2 Requests for reprints should be sent to Irwin L. 
Goldstein, Department of Psychology, University of 
Maryland, College Park, Maryland 20740. 

3 Now at Rice University. 
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ments in complex display situations like those 
described by Kibler. Indeed, studies investi- 
gating complex monitoring behavior did not 
find consistent decrements in average human 
performance (Adams, 1963), and those la- 
tency decrements that did occur were consid- 
ered trivial (Montague, Webber, & Adams, 
1965). However, recent studies (Howell, 
Johnston, & Goldstein, 1966; Johnston, How- 
ell, & Goldstein, 1966) with displays more 
complex than those used by Adams have 
identified two experimental conditions which 
produce latency decrements: low signal fre- 
quency (the number of signals presented in a 
session) and high stimulus density (the 
average number of stimuli on the display at 
any given time). However, these findings were 
obtained under conditions in which all stimuli 
and signals were relevant and thus do not 
provide information on a monitoring situation 
where both relevant and irrelevant informa- 
tion are displayed. 

The present study was designed to investi- 
gate just such a complex situation, that is, 
one in which overt responses are required to 
arbitrarily defined relevant signals but not to 
irrelevant signals. A complex format of this 
sort permits the exploration of some impor- 
tant parameters of signal frequency and stim- 
ulus density. If decrements in attention to a 
complex display occur only when signal fre- 
quency is low, it becomes meaningful to ask 
whether the important aspects of frequency 
are the number of overt responses (relevant 
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signals) or the amount of stimulus change 
(relevant and irrelevant signals) occurring 
per unit time. Similarly, it becomes important 
to learn whether the critical aspect of stimu- 
lus density is the total number of stimuli or 
just the number of relevant stimuli. 


METHOD 
Subjects 


The Ss were 48 male undergraduate students di- 
vided equally into six experimental conditions. Prac- 
ticed Ss were used to minimize confounding of learn- 
ing and vigilance efforts. 


Apparatus and Stimulus Material 


All programming of input and scoring of responses 
were performed using a combined IBM 1401/7094 
digital computer system. Four cathode-ray-tube dis- 
play consoles were linked directly to this system in 
such a way that four Ss could serve simultaneously 
under independent experimental conditions. The dis- 
play consisted of a 5-in. sq. 8 X 8 matrix. Each of 
64 cells contained either a relevant letter trigram, an 
irrelevant one, or nothing. Each trigram contained 
either an A or an E as the middle letter with the 
remaining letters chosen randomly from the rest of 
the alphabet. For one half the Ss the A trigrams 
were relevant, and for the other half the E trigrams 
were so designated. A signal was defined as either the 
appearance of a new trigram in a previously empty 
cell (add signal) or the disappearance of an old one 
(omit signal). If the change involved a relevant 
stimulus, it was a relevant signal; otherwise, it was 
irrelevant. 


Experimental Design and Signal Programming 


At any given time there were approximately 30 
stimuli on the display, and in the course of a 100-min. 
watch period 60 changes (signals) occurred involving 
this information. Using this scheme, two variables in 
a factorial design were investigated: ratio of relevant 
to irrelevant stimuli on the display (20:10, 15:15, 
10:20) and ratio of relevant to irrelevant signals or 
changes (40:20 per 100-min. period, and 20:40). In 
order to maintain the average density values, adds 
and omits occurred with equal frequency over a 
session. The other characteristics of the signals were 
determined on a random basis including: the selec- 
tion of the first and third letter of each trigram 
from a pool of all possible combinations; the spe- 
cific stimuli added and omitted; the order of oc- 
currence of adds and omits; and the selection of 
cells for placement of signals with the obvious 
restriction that the cell had to be empty for a signal 
to be added and had to be occupied for a signal to 
be omitted. Only one stimulus could occupy a cell at 
any given time and no more than one could change 
at once. The intersignal intervals varied normally 
around a mean of 100 sec. and standard deviation of 
10 sec. with the restriction that all values were 
multiples of 10 sec. 


Procedure 


Each S completed a 100-min. practice and a 100- 
min. experimental session. The 48 Ss were divided 
into the six experimental conditions. Four Ss per- 
formed simultaneously but were under different ex- 
perimental conditions. The Ss were instructed to 
remain seated, quietly but awake, for the entire 100 
min. of each session. With no exceptions Ss fol- 
lowed their instructions faithfully. 

The Ss were informed that stimuli might either 
appear or disappear and that they were to respond 
only to relevant signals by pushing immediately a 
detect button beneath their left hand. The Ss were 
also required to illuminate the cell in which the 
signal occurred with a light pencil held in the right 
hand. This latter procedure prevented S from using 
the detect button indiscriminately and also _per- 
mitted the authors to match up detections and sig- 
nals. One important characteristic of this task was 
the relative persistence of signal states. When a 
signal occurred in any cell, the new state of that cell 
(i.e., signal present or absent) remained in effect 
indefinitely, subject only to the rules of random 
selection which governed the choice of any cell for 
change. If S detected a response but could not locate 
where it occurred, he could push the DNO (Don’t 
Know) button. These kinds of partial responses 
were very rare. The computer provided immediate 
confirmation to S after each step in the response 
sequence so that S could determine if his responses 
were properly recorded. 


RESULTS AND DISCUSSION 


An earlier study (Howell et al., 1966) 
found that the monitoring behaviors for adds 
and omits are quite distinct, seemingly in- 
volving different search and memory proc- 
esses. Furthermore, detection of omits depends 
on short-term memory to a greater extent 
than does detection of adds. Therefore, the 
data for the two signal types were analyzed 
separately. For a given density condition, the 
signals of each type were organized into two 
blocks of 10 signals each for the 40:20 fre- 
quency condition, and into two blocks of 5 
signals each for the 20:40 frequency condi- 
tion. An ordinal time scale in the form of sig- 
nal blocks was used in order to avoid possible 
artifactual latency measurements produced by 
an uneven distribution of the relatively small 
number of signals across time blocks. The 
relative persistence of signals permitted the 
examination of the effects of the independent 
variable on detection latency without setting 
an arbitrary upper boundary which would 
limit all scores to a certain range. The median 
latency scores were chosen as the most sensi- 
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tive measure of individual monitoring per- 
formance because these scores were not af- 
fected by extensively long latencies which oc- 
casionally occurred under all conditions. 

An analysis of variance of the data for adds 
revealed that none of the experimental sources 
of variance was significant (p> .05). This 
finding, contrasted with the significant effects 
found with the omit signals (discussed be- 
low), points to the complicated processes 
underlying monitoring behavior with complex 
displays. The present data support. earlier 
statements (Johnston et al., 1966) that there 
are very different search and storage opera- 
tions for add and omit signals. In this par- 
ticular case the display was filled with an 
average of 30 different trigrams which makes 
storage operations very difficult. Therefore, S 
was dependent upon a continual search pat- 
tern which appears to be very sensitive to add 
signals. The Ss actually detected 91% of the 
add signals compared with only 81% of the 
omit signals. This has led to speculation that 
add signals are somewhat more attention 
demanding than omit signals,* and that this 
superiority, whatever its source, apparently 
obscured the effects of the independent varia- 
bles on add signals. More experiments with 
emphasis on search and storage operations 
are planned on this apparent difference in 
sensitivity to adds and omits. 

An analysis of variance on the latency 
scores for omit signals indicated that: the 
signal ratio or frequency effect was significant 
(p < .01); latencies increased significantly 
over blocks (p< .05); and the Blocks X 
Frequency interaction just missed significance 
(F = 4.07 was required for p < .05 and F= 
3.93 was obtained). The density variable and 
all of its interactions were not significant (p 
eS). 

As can be seen from an examination of 
Table 1, these data reveal that there was an 
overall decrement in performance for omit 
signals. Also, it is apparent that the latency 
scores for the 20:40 relevant/irrelevant fre- 
quency condition are higher than for the 40:20 
condition, Table 1 also indicates that the main 


4 Though the physical characteristics of the equip- 
ment do not account for this concept, subjective re- 
ports from Ss and Es support this view. 


TABLE 1 


MeEpIAN DETECTION LATENCY SCORES FOR Two SIGNAL 
Biocks OBTAINED FOR THE ADD AND OMIT SIGNALS 
UNDER 20:40 AnD 40:20 RELEVANT/ 
TRRELEVANT FREQUENCY 


Block 
Signal 
1 2 

Omit 

20:40 9.40 14.11 

40 :20 6.97 7.67 
Add 

20:40 Sol 5.24 

40:20 4.06 4.84 





decrement occurred in the 20:40 condition. 
This was supported by a Newman-Keuls test 
on the Frequency X Block interaction which 
showed that Block 2 was significantly differ- 
ent from Block 1 for the 20:40 condition (p 
< .05) but not for the 40:20 condition (p > 
.05). The analysis also revealed that the scores 
for the 20:40 and 40:20 conditions were sig- 
nificantly different at Block 2 (p < .05) but 
not at Block 1 (p> .05). 

The present data for omit signals support 
Jerison and Pickett’s (1964) reinforcement 
theory which assumes that signals are rein- 
forcers, that the monitoring task is composed 
of observing responses, and that performance 
improves with increases in the percentage of 
observing responses reinforced. Also, these 
data appear to indicate that lack of a per- 
formance decrement with high frequency is 
not accounted for by the amount of stimulus 
change occurring per unit time as the novelty 
theory might predict, but rather the reinforce- 
ment of observing responses to relevant stim- 
ulation. 

The lack of any effects involving stimulus 
density is puzzling. It is possible that the 
critical aspect of density is the total number 
of stimuli on the display (which was the same 
for all conditions) rather than the number of 
relevant stimuli. It is also possible that there 
were two offsetting processes that obscured 
any density effect. As you change from a 
10:20 density to a 20:10 density, you increase 
the number of stimuli to be stored which 
should inhibit performance; but you also 
decrease the amount of irrelevant stimula- 
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tion on the display which should improve 
scanning behavior and thus enhance perform- 
ance. 

In summary, the present study examined 
the performance of human Ss in monitoring 
complex displays. Decrements to omit signals 
were observed with the main source of the 
decrements being the 20:40 frequency condi- 
tion. It was under these conditions that moni- 
tors received the smallest number of reinforc- 
ing signals and the largest amount of extrane- 
ous stimulation. The effects of density and 
add versus omit signals require further work 
before any systematic statements can be 
offered about their effects upon human per- 
formance. 
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PREDICTION OF JOB SUCCESS FOR HOSPITAL AIDES 
AND ORDERLIES FROM MMPI SCORES AND 
PERSONAL HISTORY DATA’ 


JAMES N. McCLELLAND 2? anp FEN RHODES 


California State College at Long Beach 


The MMPI and selected personal history data were evaluated as predictors 
of job success for hospital aides and orderlies. Thirteen individual performance 
measures and a weighted composite criterion were employed for each job. All 
but 1 of the 10 basic MMPI scales and 10 of 13 biographical predictors showed 
small correlations with at least 1 of the individual criterion measures. Corrected 
multiple Rs of .48 were obtained between each weighted composite criterion 
and combined test and biographical predictors. The biographical data proved 
relatively more useful in this prediction. Factor analyses of individual cri- 
terion measures yielded similar factor structures for both jobs. 


The problem of selection of nurses’ aides 
nd orderlies in hospitals has become in- 
reasingly important in recent decades. Not 
nly do these auxiliary personnel comprise a 
ignificant proportion of the hospital staff, but 
so high turnover rates within their ranks 
ind inefficient performance on the job make 
he work they perform more costly than 
lecessary. Proper selection of individuals to 
ill these positions is, therefore, of consider- 
ible importance. It was the purpose of this 
tudy to investigate the usefulness of the 
MMPI and biographical data from applica- 
ion forms in selecting these personnel. 
criteria for judging adequacy of selection 
nethods were various performance ratings 
ind attendance and tenure data. 

Both MMPI and autobiographical data have 
een used successfully for the selection of per- 
onnel in hospitals and in industry. Research 
nost clearly related to the present study 
ncludes use of the MMPI by Rowe (1957) 
o predict success of psychiatric aides, Hovey 
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(1956) to predict success in the clinical train- 
ing of student nurses, Butterfield and Warren 
(1962, 1963) to predict performance and job 
tenure of psychiatric aides, and Bessent and 
Gloye (1967) to predict job performance and 
washout rate of mental hospital technicians. 

Autobiographical data, collected by means 
of job application forms, have been used 
widely in industry to predict both job per- 
formance and job tenure. Recent representa- 
tive studies include use of such data by 
Scollay (1957) to predict success of assistant 
district sales managers, by Kirchner and 
Dunnette (1957) to reduce turnover in a 
variety of office jobs, by Fleishman and 
Berniger (1960) to predict tenure of office 
workers at Yale University, by Walther 
(1961) to predict performance and turnover 
among secretaries, code clerks, and mail and 
record clerks, and by Buel (1965) to predict 
creativity of research personnel. 


METHOD 
Subjects 


The Memorial Hospital of Long Beach, California, 
is a private 550-bed hospital. Since 1961 the hos- 
pital has administered the MMPI to all its nurses’ 
aides and orderlies, either before or immediately 
after employment. The total pool of individuals 
thus made available for the study comprised 111 
nurses’ aides and 100 orderlies. Of these, 37 aides 
and 23 orderlies were still employed at the time 
of the study. 

All aides were females and all orderlies were males. 
In addition to the difference in sex, the following 
distinctions between the two occupations were 
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TABLE 1 


MEANS OF BIOGRAPHICAL PREDICTOR VARIABLES 





Predictor variable Aides® | Orderlies> 

Age 37.4 23.0 
Marital status (proportion) 

Single m5 18 

Married 49 20 

Divorced .26 02 

Widowed .10 .00 
No. dependents il, 24 
Yr. education 11.3 273, 
Proportion with health impairment He 18 
Average tenure on past jobs° 29.3 14.0 
Related experience® M3 15.9 
Salary difference: Present versus 

previous job (dollars per mo.) 9.39 | —14.39 
Proportion with restriction on hr. 

available for duty .49 39 
Length of time local resident® 38.8 47.8 

aN = 72. 

bN = 54. 


° In months. 


deemed significant enough to keep the groups sepa- 
rate throughout most of the study: 

1. Nurses’ aides are expected to give close support 
and attention to a small number of patients on a 
single ward or floor, whereas orderlies have a much 
broader assignment that takes them throughout the 
hospital. 

2. Only orderlies perform certain functions, such 
as catheterizations for male patients. 

3. Aides make decisions that are more crucial, and, 
hence, they were expected to exhibit judgment and 
initiative at a higher level than are orderlies. 

4. Because their tasks are more varied, orderlies 
have more responsibility for scheduling their work. 

Job-performance data and biographical data were 
not complete for all Ss in the pool. Consequently, 
the number of cases employed varied for different 
parts of the study. However, the number of Ss in 
either classification, aides or orderlies, available for 
any one part of the study was never less than 54. 
Actual numbers of individuals employed for each 
part of the study are indicated in the body of 
the report. 


Predictor Variables 


Scores for a total of 18 MMPI scales were tested 
for validity as predictors. Scales used were the basic 
10 (Hs, D, Hy, Pd, Mf, Pa, Pt, Sc, Ma, Si), plus 
L, F, K, and five scales corrected by the K factor 
(HS = TGe Pat KGa Pins, Sot ke weil oF) ae 
following biographical information was obtained 
from application blanks (Table 1): 

1. Age at time of employment. Ages ranged from 
18 to 57 for aides and from 18 to 51 for orderlies. 

2. Marital status. Recorded as 1 or O for single 
versus not single, married versus not married, 
divorced or separated versus not, and widowed 
versus not widowed. 
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3. Number of dependents. 

4. Education in years. 

5. Health impairment. Recorded as absence (0) 
or presence (1) of disability other than eyeglasses. 

6. Average tenure (months) on past jobs. 

7. Related experience. Recorded in years for per- 
sons reporting previous jobs in related fields. 

8. Salary difference. Computed as present minus 
last previous job in dollars per month. 

9. Restriction on hours available for duty. Re- 
corded as available on all shifts (0) or only at 
specified times (1). 

10. Number of months resident of local area. 


Approximately half of those in the aide group 
had taken the MMPI before employment and half 
shortly afterwards. Since the potential effect on 
test scores caused by this difference in administration 
is well known (e.g., Green, 1951), it was deemed 
necessary to test for differences in responses made 
by Ss who had taken the MMPI as applicants and 
those who had taken it as employees. A ¢ test 
for significance of differences between mean scores 
showed no difference (p < .05) on any of the 18 
scales. Consequently, no distinction was made be- 
tween aides who took the MMPI before and those 
who took it after employment. 


Criterion Measures 


Criterion data for which prediction was sought 
comprised supervisory ratings on a series of 5-point 
performance scales plus data on attendance and job 
tenure. The ratings, completed at normal review 
periods, were described on the rating form as: 
(a) quality of work (accuracy), (b) volume of 
work, (c) ability to follow directions (instructions), 
(d) acceptance of responsibility (pressure, initiative, 
judgment), (e) effective use of time, equipment and 
supplies, resourcefulness, ability to organize, (f) per- 
sonal grooming (hygiene, appearance, appropriateness 
of dress), (g) attendance record (tardiness, absen- 
teeism), (h) observance of confidential nature of 
patient and hospital business, and (2) relationship 
with others (co-workers, patients, public, visitors), 
courtesy, cooperativeness. 

Other criterion data used were number of months 
employed by the hospital at the time of the study 
or at termination, average number of hours absent 
from work per month, average number of absences 
per month, and presence or absence of recorded 
instances of aberrant behavior detrimental to job 
performance. 

In addition to the individual criterion measures 
described above, a weighted composite criterion was 
developed to provide a single measure of overall 
job performance. To arrive at appropriate weights 
for the different aspects of job performance for 
which quantitative data were available, 20 super- 
visors were asked to distribute a total of 100 points 
among 10 selected job-performance characteristics 
based upon the perceived importance of each 
characteristic with respect to overall job perform- 
ance. Aide and orderly positions were evaluated 
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parately by the 20 supervisors. There was a mini- 
um interval of 1 wk. between rating of the first 
nd second jobs, and supervisors were not told at 
1e beginning that they would even be rating a 
cond job. Carry-over effects still remaining were 
yunterbalanced by having half of the supervisors 
ssign weights first for aides and the other half 
rst for orderlies. Average weights assigned for each 
f the 10 performance characteristics included in the 
ymposite criterion are shown in Table 2. The 
eights for hours absent and frequency of absences 
rere changed to negative values. Weights were 
1ultiplied by individual ratings for each employee 
xpressed in standard score form and the products 
ummed to provide a composite job-performance 
ore. 


data Analysis 


Zero-order correlations between all of the MMPI 
cales and each of the 13 job-performance criteria 
rere obtained for both aides and orderlies. Correla- 
ions between biographical data items and individual 
riterion measures were similarly computed for both 
ob groups. In addition all test and personal history 
redictors were correlated with the weighted com- 
osite criterion measures developed for aides and 
rderlies on the basis of supervisory judgments. 
Multiple linear regression analyses were performed 
or each of the job groups using in turn the MMPI 
cales and the biographical data items as predictors 
f each of the 13 individual job-performance mea- 
ures. Composite weighted criterion performance was 
redicted for each job group by a linear regression 
nalysis employing the entire set of test and non- 
est predictor variables. 

Finally, separate principal-components factor 
nalyses were carried out for aides and orderlies 
sing 13 of the individual criterion measures for 
he purpose of identifying the number and types of 
idependent factors operating in observed perform- 
nce on these two jobs. A Varimax rotation was 
erformed on the obtained matrices. 


RESULTS 2 


Zero-order correlations between MMPI 
‘cores and the 13 job-performance measures 
vere generally small, and most were statisti- 
ally insignificant (p > .05). Fourteen per- 
ent were significant for aides, with a median 
gnificant r of .25. For orderlies only 9% 


- 8 Tables for both job groups giving zero-order and 
sultiple correlations of all predictors with individual 
‘siteria have been deposited with the National Aux- 
iary Publications Service. Order Document No. 
)123 from National Auxiliary Publications Service 
f the American Society for Information Science, c/o 
CM Information Sciences, Inc., 22 West 34th 
treet, New York, New York 10001. Remit in ad- 
ance $3.00 for photocopies or $1.00 for microfiche 
ad make checks payable to: Research and Micro- 
lm Publications, Inc. 





TABLE 2 


RATED IMPORTANCE OF DIFFERENT ASPECTS OF JOB 
PERFORMANCE FOR AIDES AND ORDERLIES 


Assigned weight® 
Performance characteristic 


Aides Orderlies 
Hr. absent —4 —3 
Frequency of absences —4 —5 
Quality of work 22 21 
Volume of work 14 13 
Ability to follow directions 17 16 
Acceptance of responsibility 10 9 
Resourcefulness 11 11 
Personal grooming 6 7 
Relationship with others 10 12 
Tenure 2 3 








= Represents average of ratings by 20 supervisors familiar 
with both jobs. 


were significant, with a median of .26. All 
criterion measures but three, however, could 
be significantly predicted at least minimally 
from one or another MMPI score. The three 
exceptions were tenure, attendance as rated 
by supervisors, and relationship with others 
(also evaluated by supervisors), none of 
which showed significant correlations with 
MMPI predictors for either job category. In 
addition, for orderlies, rated discretion of em- 
ployees regarding patient and hospital busi- 
ness could not be predicted from the MMPI. 
The objective measures of absenteeism (i.e., 
hours absent and frequency of absences) were 
among the most predictable criterion vari- 
ables for both groups and were tke most 
predictable for aides. Of the 18 MMPI scales, 
only Si failed to correlate significantly with 
any criterion measure for either job group. 
Relationships between biographical data 
predictors and individual performance criteria 
were somewhat larger compared with those 
for the MMPI (median significant 7 of .28 
for aides, .34 for orderlies), but the total 
number of significant (p< .05) correlations 
in each job category was considerably reduced 
(6% for aides, 4% for orderlies). Two cri- 
terion measures, namely, ability to follow 
directions and relationship with others, could 
not be significantly predicted in either job 
group from biographical data. Hours absent 
was correlated significantly for aides with 
being divorced (.28) and with health impair- 
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TABLE 3 


CORRELATIONS BETWEEN ALL PREDICTORS AND 
COMPOSITE CRITERION 














MMPI : Order- | Biographical +4. | Order- 
predictor Aides lies predictor Aides lies 
L —.23* -08 | Age -02 24 
F —.11 .09 | Single ‘LD -08 
K —.23* .05 | Married ye —.01 
Hs —.02 .14 | Divorced —.28* | —.23 
D —.12 .20 | Widowed —.15 -00 
Hy —.30* —.01 | No. dependents] —.18 mL 
Pd —.18 —.10 | Education cu 14 
Mf 04 10 | Health impair- 

ment —.02 17 
Pa —.08 —.02 | Average tenure 

on past jobs -16 23 
ep vit 06 | Related experi- 

ence —.08 19 
ISG 01 —.02 | Salary differ- 

ence: Present 

versus pre- 

vious job —.15 —.03 
Ma 01 —,.22 | Restriction of 

hr. available 

for duty 12 -O1 
St -07 16 | Length of time 

local resident | —.01 —.28* 
As+kK —.21 18 
Pd+K —.28* | —.05 
Ppa —.12 14 
Sc+K —.20 04 
Ma+K | —.03 —.20 

Note.—Aides, N = 72; Orderlies, N = 54. 
*p < .05. 


ment (.37). Frequency of absences was cor- 
related significantly with being single (—.24) 
or divorced (.23), with number of dependents 
(.29), and with health impairment (.35). For 
orderlies, however, neither absenteeism mea- 
sure correlated significantly with any bio- 
graphical predictor. Three biographical pre- 
dictors, namely, widowed, education, and re- 
striction on hours available for duty, did not 
correlate significantly for aides or orderlies 
with any job-performance measure. 
Correlations of both sets of predictor vari- 


TABLE 4 


MULTIPLE CORRELATION BETWEEN ALL PREDICTORS 
COMBINED AND COMPOSITE CRITERION 











Ob- | Cor- 
Group | tained | rected Predictors included® 
R R 
Aides> Le 48 | Hy (—2.6), divorced 
(—58.5), widowed 
(—81.3), past tenure 
(1.0), K (—4.1) 
Orderlies® | .54* -48 | Local residence (—0.1), 


education (35.6), age 
(12.7), divorced (—390) 


Note.—Relative size of raw-score weight from regression 
analysis appears in parentheses after each scale. 
- In order of relative contribution to total prediction. 
= 72 
oN = 54, 
*p <.05. 
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ables, that is, MMPI and biographical data, 
against the composite weighted criterion de- 
veloped for aides and that developed for 
orderlies are listed in Table 3. From the 
total number of 31 predictors, only 5 for the 
aides and 1 for the orderlies correlated sig- 
nificantly (p < .05) with composite measured 
job performance. 

Table 4 shows the multiple correlations 
obtained when MMPI scores and biographical 
data were combined as predictors of the 
composite performance measures for aides and 
orderlies. In calculating these Rs (and those 


in NAPS Tables C and D) a stepwise multiple — 
repression procedure was employed, which © 
began with the best single predictor variable — 
and successively selected additional variables — 
so that the maximum increase in predictive 


power would be realized as each new vari- 
able was brought in. The process was termi- 
nated in every case when the F ratio com- 
puted for a predictor to be added next failed 
to reach significance at the .25 level, indi- 
cating that inclusion of further variables 


tel net 


would not meaningfully improve the multiple 


R obtained thus far. All predictor variables 
passing this criterion are listed in the tables. 
The corrected or “shrunken” R shown for 
each obtained R is an unbiased estimate of 


TABLE 5 


Factor MAtrix For 12 JoB-PERFORMANCE 
Measures: AIDES 





Factor 
Performance measure 
I II Til IV 
Hr. absent —.07 88 | —.15 | —.05 — 
Frequency of absences | —.17 88 | —.11 | —.12 © 
Aberrant behavior —.09 | —.09 | —.43 | —.13 
Quality of work 49 | —.18 20 .64 
Volume of work 59 | —.21 18 Om 
Ability to follow .69 | —.15 07 16 
directions 
Acceptance of responsi- 43 | —.16) 3h 64 
bility 
Resourcefulness .52 | —.10 .16 aa 
Personal grooming 15 | —.13 45 49 
Attendance (rated) .44 | —.55 04 28 
Discretion re patientand}. .42/—.35} 10] .22 
hospital business 
Relationship with others} .65 | —.12 .26 od 





Note.—Factors shown are orthogonal and were obtained 
using a principal-axes solution with Varimax rotation. N = 90 
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the population correlation. As such, it ap- 
proximates the size R that would be obtained 
if the regression weights derived for the 
present employee sample were applied to a 
second, cross-validation sample. 

Tables 5 and 6 show the results of the 
factor analysis conducted for each job of the 
individual criterion measures used in the 
study. Tenure was not included in these 
analyses, although it was used as a criterion 
variable in other parts of the study, because 
this variable was assigned for both aides and 
orderlies a very low weight by supervisors 
relative to the judged importance of other 
aspects of job performance (Table 2). It was 
considered undesirable and possibly mislead- 
ing to include in this instance a variable 
rather clearly established as having little 
bearing on how well an incumbent performs 
a job. Factors were extracted by the method 
of principal components, using estimated 
communalities (R’s) in the diagonal of each 
matrix. Orthogonal rotations were performed 
by the Varimax technique. Factors retained 
for rotation accounted for at least 1% of the 
total variance. 


DISCUSSION 


In general, findings of this study are simi- 
lar to many in which a sampling of biographi- 
cal data, a personality measure, or both, have 
been used to predict job-performance criteria. 
For example, the corrected multiple Rs of .48 
reported here (Table 4) are very close to the 
average multiple Rs of approximately .50 
obtained by Rowe (1957), who used bio- 
graphical and MMPI data to predict a com- 
posite job-performance criterion for psychi- 
atric aides. Using several individual bio- 
graphical predictors combined on a logical 
basis, Scollay (1957) obtained biserial cor- 
relation coefficients of .32 and .23 in pre- 
dicting managerial success expressed on a 
dichotomous scale. 

Various studies tend to reveal some con- 
sistency in the relative effectiveness of the 
MMPI scales in predicting job performance. 
In this study the scales most often found 
useful as predictors both of individual and 
composite criteria were Hs, Pd, Mf, Ma, and 
F. On the other hand Butterfield and Warren 
(1963) found the K, Pd, and Ma scales most 


TABLE 6 


Factor MATRIX FOR 12 JoB-PERFORMANCE 
MEASURES: ORDERLIES 














Factor 
Performance 
measure 
iT II III IV V 
Hr. absent —.08 78 02 | —.02 | —.15 
Frequency of 
absences — .06 78 02 | —.13 | —.18 
Aberrant be- 
havior —.01 18 | —.11 | —.12 | —.41 
Quality of work 74 | —.19 13 alo se 


Volume of work 84 
Ability to follow 


— .06 


directions 40 | —.12 .60 16 10 
Acceptance of 

responsibility 16 | —.22 .28 14 aS 
Resourcefulness nh, 00 so .27 | —.20 
Personal 

grooming .40 | —.10 54 | —.08 .26 
Attendance 

(rated) 35 | —.56 31 .24 | —.06 


Discretion re 
patient and 
hospital busi- 





ness 16 | —.13 05 46 ats 
Relationship with 
others 43 | —.32 43 AS 07 








Note.—Factors shown are orthogonal and were obtained 
using a principal-axes solution with Varimax rotation. N = 86. 


useful, and Bessent and Gloye (1967) found 
the Pd, Mf, Pa, and Ma scales best. No 
doubt differences in outcomes of success em- 
ployed in the job requirements are responsible 
for differences of this kind. 

Particularly interesting findings of the 
present study were: 


1. The K scale alone was a useful predictor 
for three criterion variables for aides (viz. 
quality of work, ability to follow direc- 
tions, and resourcefulness), but was of no 
significance in predicting the performance of 
orderlies. 

2. The F scale was significantly correlated 
with absenteeism for both aides and orderlies. 
However, absenteeism was positively cor- 
related with F for aides and negatively for 
orderlies. No explanation is offered for this 
finding. 

3. Every one of the 10 basic MMPI scales 
except the Si scale was significantly correlated 
with at least one of the individual criterion 
variables. Application of the K suppressor 
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variable to five of the scales did not materi- 
ally increase their predictive efficiency except 
for the Pd scale. For this scale the number 
of criterion variables predicted at a signifi- 
cant level was increased from two to seven for 
orderlies when K was applied. 

4. Divorced status yielded the highest cor- 
relation (.50) with an individual criterion of 
any of the predictor variables, either bio- 
graphical or MMPI. This variable also cor- 
related with five other criterion measures, 
making it the most useful biographical 
predictor and among the most useful of all 
predictors. 

5. Education was not significantly cor- 
related with any individual job-performance 
measure. This occurred despite the fact that 
the amount of education in the sample studied 
ranged from 8 to 14 yr. However, it was noted 
that, in predicting the composite performance 
criterion for orderlies, education did appear 
as one of the most useful variables. 

6. Although there was a very great range 
of ages represented in the sample, this vari- 
able did not appear to be a useful predictor, 
except that older orderlies were rated as more 
resourceful. 

7. Biographical variables were more impor- 
tant than MMPI scores for predicting com- 
posite job performance. The Hy and K scales 
were the only MMPI variables included in 
the multiple regression prediction, and these 
applied to aides only. 


The factor analyses (Tables 5 and 6) re- 
vealed similar structures for both types and 
jobs. Four orthogonal factors were extracted 
for the aides and five for orderlies. These 
factors were named as follows: Aides—I. Gen- 
eral rater bias (halo effect) reflecting ade- 
quacy of overall job performance, II. Absen- 
teeism, III. Personal adjustment, IV. Initia- 
tive. Orderlies—I. General rater bias (halo 
effect) reflecting adequacy of overall job per- 
formance, II. Absenteeism, III. Personal 
relationship with supervisor, IV. Discretion in 
dealing with patients, V. Personal adjustment. 

The appearance of a “resourcefulness and 
initiative” factor specific to aides, and a 
“personal relationship with supervisor” factor 
specific to orderlies is consistent with the 
differences described earlier between these two 


jobs. On the one hand, aides are more directly 
and immediately involved in emergency prob- 
lems with patients and thus have the oppor- 
tunity to be resourceful and to demonstrate 
initiative. On the other hand, each orderly 
works simultaneously for several nursing 
supervisors, a situation that renders the abil- 
ity to establish satisfactory personal relation- 
ships with others especially important. 

It is emphasized that the results of this 
study must be interpreted in light of the fact 
that the sample studied was a group of pres- 
ent employees and not unselected applicants. 
Had the study been based on the latter 
group, correlations of the various predic- 
tors with performance criteria would have 
been moderately greater probably without 
exception. 
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SKIMMING LISTS OF FOOD INGREDIENTS PRINTED 
IN DIFFERENT SIZES * 


E. C. POULTON 2 


Applied Psychology Research Unit, Cambridge, England 


72 housewives aged 22-64 yr. searched for particular words in lists of ingredi- 
ents printed in 10-, 7.5-, 6-, and 4-pt. Univers lower case type with 10% 
leading. There were 4 sets each of 15 lists, and a 4 X 4 factorial design was 
used confounding order and sets of lists. Illumination levels of 40 and 2 ftc. 
were tried in separate experiments. There was a large (p< .001) drop in the 
rate of locating ingredients when the size of print was reduced from 6 to 4 pt. 
Increasing the size from 6 to 10 pt. had less effect (p< .05). In the dim light, 
4 housewives over 50 yr. could not locate any ingredients printed in 4 pt. The 
conclusion was that ingredients should not be printed in lower case Univers 
smaller than about 6 pt., which has the same x height or apparent size as 8-pt. 


book type. 


It is clear from common observation that 
10usewives do not read through the complete 
ist of ingredients on each package of food 
which they buy. If they look at all, they 
srobably search the list to see only if the food 
contains an ingredient they particularly want 
t to contain, or perhaps an ingredient which 
they do not want it to contain. Assuming this 
0 be the case, the clarity of print for lists of 
ngredients would be tested most appropri- 
utely by asking housewives to search the lists 
‘or particular ingredients. This is the method 
which has been used here. 

Small food containers have sufficient sur- 
face available only for small amounts of 
srint. If lists of ingredients are to be printed 
n full, the print may have to be minute. One 
rim of the present experiments was to deter- 
nine the smallest size of print which house- 
wives could be expected to read. It was hoped 
0 be able to specify a minimum size of print, 


1 This research was carried out at the request of 
[he Metal Box Co. Ltd., which kindly supplied the 
wrinted materials. The British Food Manufacturers 
‘ederation kindly defrayed the cost of the house- 
vives. The author is also grateful to I. Harris and 
» Harris of The Metal Box Co. for their help and 
ncouragement. P. M. E. Altham kindly advised on 
he experimental design. Financial support from the 
sritish Medical Research Council is also gratefully 
cknowledged. 

2 Requests for reprints should be sent to the au- 
hor, Medical Research Council, Applied Psychology 
tesearch Unit, 15 Chaucer Road, Cambridge, Eng- 
ind. 
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below which there would be a marked fall in 
the rate at which ingredients could be read. 

Considering the large amount of work 
which has been carried out upon the clarity 
of print (Burt, 1959; Luckiesh & Moss, 1942; 
Tinker, 1963), it might be presumed that the 
minimum size of print for food containers 
could be determined by reference to existing 
experimental data. However a thorough search 
of the literature indicated that this was not 
so. Most of the work has been concerned 
more with the optimal size of print than with 
the minimal acceptable size. 

Luckiesh and Moss (1942, Table XVII) 
compared 6-pt. Textype set 46 characters to 
the line with 1-pt. leading with 8-, 10-, and 12- 
pt. set 52 characters to the line with 2-pt. 
leading. They used blink rate as their cri- 
terion of legibility and found practically no 
differences. If anything, the 8-pt. Textype 
was the most legible by this criterion. How- 
ever Tinker (1946) has shown since that the 
rate of blinking is not a valid measure of 
difficulty in reading. In Luckiesh and Moss’ 
experiments rate of reading hardly varied with 
legibility, presumably because their readers 
knew that they were not to be tested for com- 
prehension. 

Burt (1959, p. 12) compared sizes of 
Times Roman ranging only from 8 to 14 pt. 
Unfortunately he did not tabulate his experi- 
mental data nor indicate what statistical 
tests, if any, he carried out. He simply stated 
that 10-pt. prose was the most legible, Most 
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students found 9 pt. equally legible, while 
older persons often did best with 11 or even 
12s pt 

Tinker and Paterson also were concerned 
principally with the optimum size of print. 
However, in their (1932) study 62 pt. Mergen- 
thaler’s Ionic linotype on a 7-pt. body was 
reduced by the planographic offset-printing 
process. Compared with a control group, they 
found a 2% reduction in the students’ mean 
rate of reading when the size of the type was 
reduced to 80%. Reducing the size to 50% 
reduced the mean rate of reading by 11%, a 
fall which was reliable statistically. But the 
catastrophic fall in rate of reading was shown 
only by a reduction in the size of the type to 
30%. At this point the mean rate of reading 
had fallen by 74%; 15 of 90 students scored 
zero. On this evidence 3.5-pt. newspaper 
print should be just acceptable for students. 

Unfortunately students are not comparable 
to housewives, since most are young. Set 
against this, newspaper type is designed to be 
read in its existing size; it is not designed to 
be reduced in size. Normally a revised set of 
letters has to be designed for every change of 
about 2 in point size, since smaller letters 
require relatively thinner lines and larger 
spaces within the letters (“counters”). If this 
holds for Mergenthaler’s Ionic linotype, it is 
possible that a properly designed 3-pt. type 
might have been as acceptable for students as 
the 6#-pt. linotype reduced to 50%. 

In order to produce data like Tinker and 
Paterson’s, but relevant to the printing of 
ingredients on food containers, it was decided 
to use (a) housewives with a wide range of 
ages, rather than students, and (0b) a type- 
face like Univers which was designed to be 
reduced photographically. 


METHOD 
Materials 


Lists of ingredients were taken from the contain- 
ers of 60 different manufactured foods. The lists 
averaged 17 words, with a range from 6 to 49. They 
were sorted by length into four comparable sets of 
15. Within each set the lists were numbered 1 
through 15. They were printed in 10-pt. Univers 
(Monotype series No. 689) with 1-pt. leading in two 
columns on a quarto sheet of paper 10 X 8 in. At the 
top of each list, next to its number, was the name of 
the food printed in capitals. The ingredients were 


printed in lowercase, with capitals only for the 
first letter of each list and for the letter codes of 
vitamins, for example, vitamin B:. The lists were 
set in unjustified lines which ranged in length from 
25 to 35 letter spaces. Each list was reproduced 
photographically four times, in sizes to give about 
10-, 7.5-, 6-, and 4-pt. letters (see Figure 1) with 
leading ranging from 1.0 to 4 pt. This insured that 
the words in a list did not change positions when 
the size of the lettering was changed. 

One word in each list was selected as a target. For 
the first eight lists in each set, two of the target 
words were at the start of a line, two were at the 
end, and the remaining four were neither at the 
start of a line nor at the end. For the last seven 
lists (which most housewives did not reach), there 
were again two target words at the start and end 
of a line, but only three with intermediate positions. 
If a housewife searched each list only until she 
reached the target word, the number of words she 
would have needed to scan in the first eight lists of 
each set ranged from 76 to 87. For the remaining 
seven lists the number ranged from 74 to 94. A 
stenciled question sheet was prepared for each of 
the four sets of lists. Like the corresponding set of 
lists, it carried the list numbers and food names 
typed in capitals. Below each name was the single 
target word typed in lowercase. 

There were also two practice sets, each comprising 
six lists, and two practice question sheets. The first 
practice set was stenciled like the question sheets; 
the second was printed in Univers, and reproduced 
four times, once with each of the four test sizes of 
letters. 


Experimental Conditions 


Two experiments were run under different levels of 
illumination corresponding, respectively, to super- 
markets and domestic food closets. In the first there 
was a fairly uniform level of about 40 ftc. on the 
tops of the tables at which the housewives sat. The 
lighting came from four fluorescent lamp fittings 
hanging 9 ft. above the floor. In the second experi- 
ment the level of illumination was reduced to about 
2 ftc. by wrapping layers of pale blue cotton 
sheeting around the fluorescent lamps. The illumi- 
nation levels were measured using a photometer 
made by Salford Electrical Instruments Ltd. 

In each experiment the four sizes of print and 
four sets of lists were arranged in a Latin-square 
design with four groups of housewives. The lists 
were always presented in the same order, thus con- 
founding practice effects with list difficulty. There 
were altogether 32 housewives in the experiment 
with the bright lighting and 40 in the experiment 
with the dim lighting. Housewives were randomly 
allocated to conditions. 


Procedure 


The experiments were conducted on groups of 
about 20 housewives seated at tables. Each house- 
wife was handed a cardboard file containing her 
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test passages and questions, and a black ball-point 
pen. The order of her experimental conditions dif- 
fered from that of her immediate neighbors. The 
first practice set of lists was used to teach the pro- 
cedure. The housewife had to read the target word 
on her question sheet, find the word in the corres- 
ponding practice list, and cross it out. She was 
asked to work through the list as quickly as possi- 
ble. The size of print of the second practice set of 
lists was the same as that of the first experimental 
condition, to insure that the housewife was thor- 
oughly familiar with the experiment before she 
started. The procedure of the second practice was 
identical with that used in the experiment. 

Before each part of the experiment the housewife 
took a stenciled question sheet from the appropriate 
compartment of her file and placed it on her left if 
she was right-handed. When everyone was ready, she 
took the corresponding test sheet from her file, placed 
it on her right, and started searching for the target 
words. She was allowed 25 sec. only. No one ever 
crossed out more than 13 of the 15 target words in 
this time. The experiment and practice together took 
about 25 min. 


Experimental Subjects 


The 72 housewives were members of a panel main- 
tained at the Applied Psychology Research Unit at 
Cambridge. Their ages ranged from 22 to 64 yr., 
with a median age of 46 yr. Seventy percent wore 
reading glasses for the experiment, and one more 
said that she ought to have done so. They were paid 
7/6 per hour (about $.90) for their services, plus 
traveling expenses. 


RESULTS 


Figure 1A gives the results of the very first 
experimental condition, before the housewives 
had tried any of the other sizes of lettering. 
Comparisons are subject to chance differ- 
ences between the groups of housewives re- 
ceiving each condition, but are free from 
transfer effects (Poulton & Freeman, 1966). 
In both the bright and the dim illumination 
the difference between the 4-pt. and 6-pt. 
type was reliable at the .05 level of signifi- 
cance on a two-tailed Mann-Whitney U test 
(Siegel, 1956, pp. 116-127). In the bright 
illumination the difference between 6-pt. and 
10-pt. Univers was just reliable (p < .05) on 
a one-tailed test. In the dim illumination none 
of the differences between 6-, 7.5-, and 10-pt. 
type was reliable statistically (p> .05). 

Figure 1B gives the results pooled over 
housewives and orders of conditions. The 
only comparisons subject to chance differences 
between the groups of housewives are between 
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Fic. 1. Mean numbers of ingredients found in 25 
sec. with different sizes of Univers lettering. (Un- 
filled points: about 40 ftc. on table top. Filled points: 
about 2 ftc. For the first trial, A, each unfilled point 
represents the mean performance of a separate group 
of 8 housewives, each filled point, the performance of 
10 housewives. For all trials taken together, B, the 
broken line represents the performance of the same 
32 housewives; the unbroken line is for 40 house- 
wives.) 


the two levels of illumination. Within each 
level of illumination the results for size of 
print are subject to unknown transfer effects 
from previous conditions with other sizes of 
lettering. An analysis of variance followed by 
Tukey’s range test (Ryan, 1959, Appendix) 
was carried out separately on the data for 
each level of illumination. These statistical 
tests indicated that at both levels of illumi- 
nation there was a highly (p < .001) reliable 
difference between the 4-pt. and 6-pt. type. 
In the bright illumination there was no sta- 
tistically reliable difference between 6-, 7.5-, 
and 10-pt. Univers. But in the dim illumina- 
tion the 10-pt. Univers was reliably better 
than any of the smaller sizes (p < .05 or bet- 
ter). 

When presented with the small 4-pt. type in 
the dim light, 4 of the 40 housewives failed to 
locate any target words. All 4 were over 50 
yr. old. Four other housewives located only 
one target word during the 25 sec. available. 
There were also 3 housewives who located 
only one target word in the bright light of 
the group of 32 housewives in this condition. 
The 7 housewives who located only one target 
word were all over 45 yr. old. 


DISCUSSION 


It is clear from Figure 1A and B that under 
certain conditions 10-pt. Univers may be 
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preferable to smaller sizes such as 7.5 or 6 
pt. However, the main finding is the very 
large fall in the rate of locating target words 
when the size of the type was reduced from 
6 to 4 pt. It suggests that 6-pt. Univers is 
about as small as food manufacturers should 
print the ingredients on containers if they are 
to be reasonably clear and legible to house- 
wives of all ages. 

The apparent size of lowercase print is de- 
termined principally by the x height (the 
height of the rounded parts of the letters, ex- 
cluding the ascenders and descenders), not by 
the overall height or point size (Poulton, 
1965, Figure 1). Univers is a modern sanserif 
face with an x height which is large in propor- 
tion to its face size. In order to find a con- 
ventional serifed book typeface such as Mod- 
ern Extended No. 1 (Monotype Series No. 
7), Baskerville (No. 169), or Bembo (No. 
270) with the same x height, it is necessary to 
add one-third to the point size (Poulton, 
1965, Table 1). Thus, if 6-pt. Univers is 
taken as the minimal acceptable size for in- 
gredients, this is equivalent to about 8-pt. 
Modern Extended, Baskerville, or Bembo. 

The principal difference between the pres- 
ent results and those of Tinker and Paterson 
(1932) outlined previously must be due to 
the different age range of the experimental 


subjects. Older people not only require a 
rather larger size of print for the optimal con- 
dition (Burt, 1959, p. 12), but also they can- 
not all read 4-pt. Univers in poor illumina- 
tion. This has the same apparent size as 5.5- 
pt. normal book type. 
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Most previous studies of the motivational effects of knowledge of results have 
failed to control for differential goal setting by Ss in the different knowledge 
conditions. The present study attempted to separate the effects of knowledge 
qua knowledge from those of goal setting using a 2 X 2 factorial design. The 
task was simple addition. The factors were knowledge of (raw) score (KR) vs. 
no knowledge of (raw) score (No KR), and hard vs. easy goals. Scores in the 
KR condition were given in such a form that they could not be used to set goals. 
The hard- and easy-goal Ss, on the other hand, were informed only of their 
progress in relation to a standard set by E. It was found that the hard-goal Ss 
worked significantly faster than the easy-goal Ss, but the KR and No KR 


groups did not differ in performance. 


The positive effects of knowledge of results 
(KR) on learning and performance are firmly 
established in the research literature (e.g., 
Ammons, 1956; Annett, 1961; Bilodeau & 
Bilodeau, 1961; Vroom, 1964). However, the 
question of how KR facilitates performance 
has not yet been answered. 

The present paper is concerned only with 
the motivational function of KR. Thus, atten- 
tion will be focused on those types of KR 
which have few or no cue or directive proper- 
ties (i.e., KR which does not inform one of 
the nature and locus of one’s errors or suggest 
how they might be corrected). Examples of 
purely or predominantly motivational KR 
would be knowledge of total score on a task 
summed over a number of trials or KR on 
simple psychomotor tasks which the subject 
(S) already knows how to perform. While it 
is true that KR of this type could be in- 
terpreted by S as a signal or cue to change 
his method of performing the task, it would 
not tell him what changes to make or how to 
go about correcting his errors. Such KR 
would have no directive function. (See Payne 
& Hauty, 1955, for a similar distinction be- 
tween directive and motivational KR.) 


1 This research was supported by Grant No. MH 
12103-01A1 from the National Institute of Mental 
Health. 

2Now at the University of Maryland. Requests 
for reprints should be sent to Edwin A. Locke, De- 
partment of Psychology, University of Maryland, 
College Park, Maryland 20740. 
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A strong argument can be made for the 
thesis that KR affects performance indirectly: 
by influencing the nature of the goals the 
individual sets on the task. It can be ob- 
served by introspection that knowledge by 
itself does not have the power to initiate 
action. Man is necessarily selective in his use 
of information. The actions a man takes with 
respect to an item of knowledge depend upon 
the perceived significance of that knowledge to 
him. If an individual (in an experimental 
setting) appraises KR as signifying inade- 
quate performance on a task, he will ordi- 
narily set a goal to improve his subsequent 
performance. If he appraises KR as signifying 
adequate or superior performance, he will 
ordinarily set a goal to maintain or reduce 
his level of effort. And if the KR is appraised 
by him as irrelevant, he will take no action at 
all regarding it. Thus to predict performance, 
it is not sufficient to know that an individual 
was given KR; one must also know what he 
decided to do about it, that is, what goals he 
set in response to it. 

If KR motivates performance through or 
by means of its effects on goal setting, this 
means that KR should have no effect on per- 
formance when differential goal setting is 
controlled. 

In this context, it is relevant to ask whether 
effects attributed to (motivational) KR in 
previous studies could have been due to dif- 
ferential goal setting associated with the dif- 
ferent KR conditions. An examination of the 
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literature on motivational KR indicates that 
the two variables have, in fact, been fre- 
quently confounded. (For a detailed review 
of these studies see Locke, Cartledge, & 
Koeppel, 1968.) 

Book and Norvell (1922), Crawley (1926), 
and Mackworth (1950), for example, assigned 
their KR Ss specific goals to aim for, while 
No KR Ss were typically told to ‘do their 
best.” (Previous studies have found that spe- 
cific hard goals lead to a higher performance 
level than a goal of “do your best”; Locke & 
Bryan, 1967.) 

In other studies (Adams & Humes, 1963; 
Church & Camp, 1965; McCormack, Binding, 
& Chylinski, 1962; McCormack, Binding, & 
McElheran, 1963; Payne & Hauty, 1955) 
goals were not explicitly assigned to Ss, but 
KR was always expressed in relation to a 
standard. For example, KR Ss would be told 
whether or not they had surpassed their im- 
mediately previous score on the task. When 
an experimental S is informed that he has 
failed to surpass a standard set by the ex- 
perimenter (£), it is highly likely that he will 
“try harder” on subsequent trials. Similarly 
he might be expected to relax somewhat if 
told that his performance had exceeded ex- 
pectations. 

There have been a number of studies of 
(motivational) KR in which there was no 
obvious manipulation of goal setting by E 
(Arps, 1920; Brown, 1932; Johanson, 1922; 
Mace, 1935; Manzer, 1935), but the possibil- 
ity that Ss in the KR and No KR conditions 
set different goals spontaneously cannot be 
ruled out. Mace, in fact, explained his results 
by arguing that the KR Ss set themselves 
harder implicit goals on the task than the No 
KR Ss. 

The potential importance of controlling for 
differential goal setting was indirectly empha- 
sized in a recent study by Chapanis (1964). 
He “hired” his Ss as employees rather than 
as “experimental” Ss and ran them individu- 
ally for an hour a day for 24 days. The task 
was punching digits onto a paper tape. Chap- 
anis found no differences in the output of the 
KR and No KR groups. Two characteristics 
of this study that may have lessened the 
motivation of the KR Ss to set specific goals 
were (a) the probable absence of the “demand 


characteristics” (Orne, 1962) which are in- 
herent in most experimental situations, such 
as the implicit demand to “improve” one’s 
performance, and (0d) the absence of the op- 
portunity for inter-S competition which is 
present in many studies of this type (e.g., 
Gibbs & Brown, 1955). 

Both the foregoing discussion and previous 
findings in this area point to the desirability 
of experimentally separating the effects of 
KR per se from those of goal setting. In a 
previous study using a 2 X 2 design, Locke 
(1967a) gave half his Ss knowledge of their 
actual scores on an addition task, while half 
received no KR. The opportunity for differ- 
ential goal setting by these groups was re- 
duced by using trials of alternating (10 and 
15 min.) lengths, so that scores on consecutive 
trials were not comparable. Half the Ss were 
told to “do their best” on the task, while half 
were assigned specific hard goals to reach. 
The goals were indicated by colored cards 
placed in Ss’ box of problem cards designating 
how far they were supposed to get on that 
trial. No effect of KR condition on perform- 
ance was found in this study, whereas there 
was a significant goal effect in favor of the 
hard-goal Ss. 

The present study was also designed to 
separate KR and goal-setting effects, but in- 
corporated a number of methodological im- 
provements over the previous study by Locke 
(1967a). In that study the No KR Ss could 
have gotten some idea of their progress, as 
the box of problems cards was clearly visible 
in front of them. Further, the alternating 
trial lengths did not prevent Ss from com- 
paring their scores on every other trial and 
setting personal goals. Finally the hard-goal— 
No KR Ss could have obtained knowledge of 
their scores by observing where the goal card 
was placed in the box. The addition task was 
structured in the present study so that these 
problems were avoided. 

A further problem with the previous study 
concerned the “do best” goal. Although this 
goal was used purposely in order to duplicate 
the goals that had been used by earlier inves- 
tigators (Book & Norvell, 1922; Crawley, 
1926; Locke & Bryan, 1966), there was no 
way of specifying a priori just what level of 
motivation such a goal would induce. In the 
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present study it was decided to use specific 
hard goals and specific easy goals rather than 
hard and “do best” goals. Numerous previous 
studies have indicated that hard goals con- 
sistently produce a higher level of perform- 
ance than easy goals (see Locke, 1968, for a 
summary ), thus the same prediction was made 
for the present study. No effect of KR per se 
on performance was anticipated. 

It should be mentioned that although it is 
possible to give KR in such a form that goal 
setting is completely ruled out, it is not possi- 
ble to set Ss’ goals without giving them some 
knowledge, either of the goals themselves 
and/or of their progress in relation to these 
goals. In the present study the terms “hard 
goal” and “easy goal” will be used to de- 
scribe groups given knowledge of their prog- 
ress in relation to goals set by E, while the 
terms KR and No KR will be used to refer to 
groups given or not given knowledge of their 
actual (raw) scores on the task. 


METHOD 
Subjects 


The Ss were 23 male and 17 female University of 
Maryland volunteers who were paid for participa- 
tion. The Ss were run individually. 


Task 


The task was simple addition; each problem con- 
sisted of three two-digit numbers. The problems were 
typed on a roll of paper which was wound on a 
spool. The spool was placed in a box with a trans- 
parent window. The S advanced one problem at a 
time to the window by turning a knob. Answers 
were written on separate answer sheets which con- 
tained spaces for 60-90 answers. (The number and 
arrangement of the spaces on the page varied from 
sheet to sheet.) Problems on the spool were matched 
with answer spaces by means of random numbers 
which identified each problem and its corresponding 
answer space. These numbers told S where to write 
his answers, but could not reveal how many prob- 
lems he had done. (The S did not have to search 
the whole page for the right space since the answer 
sheets were set up so that S worked across the page 
line by line, as in reading a book.) When an answer 
sheet was completed, S began immediately on the 
next sheet. At the end of the trial all sheets for that 
trial were handed in to E in an adjoining observa- 
tion room through a hole in the wall. 

There were five trials, each of a different length, 
ranging from 8 min. 45 sec. to 15 min. 30 sec. (M = 
12 min.). Trials were separated by 3-min. rest 
periods. The Ss were interrupted at specific intervals 
during each trial and given feedback as described 


below. There were a total of 30 interruptions during 
the five trials, the average interval between inter- 
ruptions being 2 min. (range: 50 sec. to 3 min. 25 
sec.). The Ss knew that the interruption periods and 
trial times all differed in length, but did not know 
what the lengths were. 


Design and Procedure 


The design was a 2 X 2 factorial model with 10 Ss 
per cell. The variables were KR versus No KR and 
hard versus easy goals. To assign Ss to cells, the 40 
Ss were first ranked in order of ability as determined 
by a 3-min. addition pretest. The four Ss of lowest 
ability were then assigned at random to one of the 
four cells, the next four Ss were treated identically, 
and so on until all Ss had been assigned to a cell. 
This had the effect of equalizing the cell ability 
means while retaining random selection. 

No knowledge of score (No KR). Half the Ss 
were not given their scores at any time during the 
experiments. 

Knowledge of score (KR). Half the Ss were told 
the cumulative number of problems they had at- 
tempted on that trial at each interruption point. 
These interruptions were always made after S had 
just completed a problem and was going to the next 
one, never when S was actually working on a prob- 
lem. At the end of each trial, S was told how many 
problems he had gotten correct on that trial. (This 
could not be done during the trial as E did not 
have the answer sheets.) 

Easy goal. Half the Ss were assigned easy output 
goals, in terms of number correct, to work for on 
each trial. To reach these goals, S had to work at 
67% of his rate (problems correct per minute) 
on the 3-min. practice trial. The S was told that the 
assigned rate was slower than his practice trial, but 
was not given the actual percentage figure. 

The Ss were not told their actual numerical goals, 
but were informed of their progress in relation to 
their goal by means of a set of four lights displayed 
on a console. The illumination of the lights was con- 
trolled by E. When a 1-in. (diameter) red light was 
illuminated, this indicated that S$ was more than 
three problems behind the pace required to reach 
his goal for that trial; the illumination of a 4-in. red 
light indicated that S was one to three problems be- 
hind his assigned pace; the illumination of a 4-in. 
green light indicated S was one to three problems 
ahead of his assigned pace; and the illumination of 
a l-in. green light indicated that S was more than 
three problems ahead of his assigned pace for that 
trial. The Z turned on the appropriate light at each 
interruption point to indicate S’s rate in relation to 
his goal at that point in the trial. 

The Ss were informed in advance as to what the 
different lights signified and were instructed to speed 
up in response to the red lights. They were told that 
the lights indicated their performance relative to 
their goal only in terms of number attempted. The E 
kept track of S’s number of problems attempted by 
observing his progress through a one-way mirror, At 
the end of each trial (after E had corrected the 
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answer sheets), S was told whether or not he had 
beaten his goal for that trial in terms of number 
correct. 

Hard goal. Half the Ss were assigned hard output 
goals to reach on each trial. To reach his goal, a 
hard-goal S had to work at the same pace he had 
worked at during the 3-min. practice trial. The Ss 
were told that they had to work at 100% of their 
practice-trial pace in order to reach their goals. 
Feedback was given to these Ss with lights as with 
the easy-goal Ss. 

To summarize, each S was interrupted 30 times 
during the five trials. The S was told his cumulative 
number of problems attempted (if in the KR condi- 
tion) and was informed (by means of lights) of his 
work pace in relation to his assigned goal. At the 
end of each trial S was told his number correct (if 
appropriate) and whether or not he had beaten his 
goal on that trial. 

The design of the task insured that KR and goal 
setting were separated. The KR Ss could not use 
their knowledge to set personal goals, and the No KR 
Ss could not use the goal feedback to get knowledge 
of their actual scores. The goal Ss, of course, had 
some knowledge, but it was expressed only as a rela- 
tion between their performance and a standard set 
by £. 

The £ remained in an observation room during 
the experiment, but kept track of S’s progress through 
a one-way mirror (of which S was made aware). 
Communication was possible through an intercom 
system. 


RESULTS 


Three different performance criteria were 
used: (a) deterioration in number at- 
tempted, defined as the difference between 
mean number of problems attempted per min- 
ute on the practice trial and the mean num- 
ber attempted per minute in the experimental 
period (the deterioration in rate was due to 
the experimental trials being longer than the 


practice trial), () deterioration in number 
correct, same as above, but in terms of num- 
ber correct, and (c) percentage of errors, de- 
fined as the total number of problems wrong/ 
total number of problems attempted during 
the experimental period. 

The number attempted curves for the hard- 
and easy-goal Ss are shown in Figure la, and 
the corresponding curves for the KR and No 
KR Ss are shown in Figure 1b. It is evident 
that the hard-goal Ss maintained a faster 
work pace than the easy-goal Ss, but the KR 
and No KR Ss worked at virtually the same 
pace. 

The data were analyzed using a standard 
2 X 2 factorial analysis of variance. The goal 
effect was significant for the number at- 
tempted criterion, F(1, 36) = 5.50, p< .05, 
and for the percentage of errors criterion, 
F(1, 36) = 7.67, p< .01, the hard-goal Ss 
attempting significantly more problems and 
making significantly more errors than the 
easy-goal Ss. The difference for the number 
correct criterion also favored the hard-goal Ss, 
but did not reach significance, F(1, 36) = 
Zl op anh, 

There was no significant KR effect for any 
of the criteria (all fs < 1.02), nor were 
there any significant interaction effects (all 
Fs< 1). 

The work rate (number attempted) of the 
two goal groups as a function of the length of 
the work period between interruptions was 
also compared. Dividing the intervals into 
those under 1 min. (~ = 8), those between 
1 and 2 min. (7 = 14), and those over 2 min. 
(n= 13), a two-factor repeated-measures 
analysis of variance was performed. There 
was a significant interaction between goal 
group and interval length, F(2, 76) = 3.92 
(p < .05). The hard-goal group worked at a 
faster rate during the longer intervals than 
during the shorter intervals, while the oppo- 
site was true for the easy-goal group. 

The empirical difference in goal difficulty 
between the hard and easy goals can be seen 
by comparing the distribution frequencies of 
the various feedback lights, shown in Table 1. 
The hard-goal Ss had large or small green 
light feedback (indicating they were ahead of 
their assigned pace in number attempted) 
only 37% of the time, whereas the easy-goal 
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Ss had green light feedback 96% of the time. 
The chi-square value based on the frequency 
distribution shown in Table 1 is 536.38 (p< 
.001, df =3). In terms of number of prob- 
lems correct for the trials as a whole, the 
hard-goal Ss beat their goals on 10% of the 
trials, while the easy-goal Ss beat their goals 
on 86% of the trials. 

The KR and No KR Ss did not differ sig- 
nificantly in the distribution of feedback lights 
encountered nor in the percentage of success 
in reaching their trial (number correct) goals. 

Interestingly, when the actual mean work 
rates (in relation to practice-trial rate) were 
compared after red versus green light feed- 
back, no differences emerged within either 
goal group. The hard-goal Ss worked at a 
relatively faster rate than the easy-goal Ss 
after both red and green light feedback. The 
t for the mean difference in rate after green 
light feedback was 2.88 (p < .01). The cor- 
responding ¢ for red light feedback was only 
1.64 (p< .05), but this lower value may 
have been due to the very small number of 
easy-goal Ss (”=7) who experienced red 
light feedback. The absolute mean difference 
in work rate between the two goal groups 
was the same for both green and red light 
feedback. 

These findings indicate that the faster work 
rate of the hard-goal Ss cannot be attributed 
simply to the greater amount of red light 
feedback they received as compared with the 
easy-goal Ss. The hard-goal Ss adopted a 
generally faster work pace in response to all 
the lights. This is not surprising in view of 
the fact that the hard-goal Ss had to work 
33% faster than the easy-goal Ss to get the 
same light (the large red light excluded) to 
come on; that is, the standards regulating 
the light feedback were higher for one group 
than the other. If one assumes that all Ss 
were trying to get the green lights to come 
on, then the fact that one group had to work 
faster to do this than the other would explain 
their faster overall rate. 


DISCUSSION 


The finding that KR had no effect on per- 
formance when goal setting was controlled 
supports the results of Locke’s (1967a) study 
in which the two variables were less defini- 
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TABLE 1 
FREQUENCY OF FEEDBACK LIGHTS AS A FUNCTION OF 
Goat CONDITION 








Feedback light 
Goal =a 
paw canes ace 
Hard 250 129 121 100 600 
Easy 5 22 115 458 


600 





Note.—There were 20 Ss in each goal condition, and each S 
had feedback lights on 30 different occasions. For an explana- 
tion of the meaning of each light, see text. 


tively separated. The positive relationship ob- 
tained between goal difficulty and perform- 
ance level also supports the results of a num- 
ber of previous studies (Day & Kaur, 1965; 
Locke, 1966, 1967b, 1968; Stedry, 1960). 

The fact that the hard-goal Ss surpassed 
the easy-goal Ss in number attempted but 
not in number correct is not surprising if 
one considers that KR during the trials was 
given in terms of rate (i.e., number at- 
tempted) rather than in terms of number cor- 
rect. The greater number of errors achieved 
by the hard-goal Ss can be attributed to their 
trying to speed up at the expense of accuracy. 

No definite explanation can be given for 
interaction between pace and length of work 
interval for the two goal groups, but some 
hypotheses may be suggested. The slower rate 
of the easy-goal Ss during the longer intervals 
could be the result of a relaxation of effort 
over time on their part; such relaxation would 
be encouraged by their knowledge that they 
were (usually) ahead of their required pace. 
In contrast, the hard-goal Ss, who were typi- 
cally behind their required pace, may have 
been overly tense during the short trials, but 
may have relaxed enough during the longer 
trials to increase their rate. 

The present findings indicate that the 
results of previous studies in which KR 
and goal setting were simultaneously manipu- 
lated could be attributed solely to goal-setting 
effects. However, it remains to explain the 
results of KR studies in which goal setting 
was not explicitly manipulated (e.g., Arps, 
1920; Brown, 1932; Johanson, 1922; Mace, 
1935; Manzer, 1935). Extrapolating from the 
present findings, it may be hypothesized that 
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the KR groups in these previous studies inde- 
pendently set themselves harder goals than 
did the No KR groups. A recent study by 
Locke and Bryan (1968) supports such an 
interpretation. One group of Ss was given 
KR on each of 16 5-min. trials of a computa- 
tion task, while another group did not receive 
KR. No goal-setting instructions were given. 
The KR group performed better than the 
No KR group on the last eight trials, but 
only because the KR Ss spontaneously set 
harder goals on these trials than the No KR 
Ss. When goal differences were controlled by 
partialing, the KR effect was vitiated. This 
finding supports the view that when KR does 
facilitate performance it does so through its 
effects on goal setting. 

This finding is perhaps not very surprising. 
It is difficult to imagine how KR _ could 
motivate performance unless it were given 
in such a form that the individual could ap- 
praise it in some manner. But appraisal 
requires a standard against which perform- 
ance can be judged. Typically an S uses his 
own previous performance or the perform- 
ance of another S (or, in the case of some 
experiments, standards supplied by /). Such 
standards, of course, were not provided by the 
KR given in the present study. When ques- 
tioned about the use they made of the knowl- 
edge of their scores, the KR Ss agreed unani- 
mously that it was of no use to them what- 
ever, and that they had therefore ignored it. 

These findings imply further that KR 
should affect performance most when goal 
setting is most facilitated (assuming S desires 
to improve). Clearly goal setting is easier 
when all the trials are of the same length 
than when they are all of different lengths; 
in the former case S has some standard by 
which he can judge and evaluate his per- 
formance and set new goals, whereas in the 
latter case he does not. Further, KR which 
leads to the setting of hard goals should be 
more effective than KR which leads to the 
setting of easy goals. The setting of hard goals 
could be facilitated by giving KR in relation 
to hard standards (e.g., S’s best previous 
score) and the setting of easy goals by giving 
KR in relation to easy standards (e.g., S’s 
worst previous score). 

The finding that KR has no effect on per- 


formance independent of goal setting is con- 
gruent with results obtained in studies of 
other incentives, including money (Locke, 
Bryan, & Kendall, 1968), time limits (Bryan 
& Locke, 1967), and “verbal reinforcement” 
(Dulany, 1968; Holmes, 1967; Spielberger, 
Bernstein, & Ratliff, 1966). These incentives 
were found to affect performance only if and 
to the extent that they affected Ss’ conscious 
goals and intentions. The evidence indicates 
that more attention should be paid to the 
goals and intentions which S develops in re- 
sponse to the incentives ~ provides. It would 
also be of interest to study how the form in 
which incentives are given affects the goals 
which Ss develop on tasks. 
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OFF-—QUADRANT COMMENT 
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This discussion points out a serious methodological problem of quadrant 
analysis (QA), a method for identifying correlates of individual differences in 
predictability. QA involves division of the predictor-criterion scatterplot into 
four cells by cutting at the predictor and criterion medians. For cases below the 
predictor median, and separately, for cases above the predictor median, a 
comparison is made between the test responses and scores of those above the 
criterion median with those below. The distinguishing responses and scores 
are combined into moderator variables, one for each predictor group. To the 
extent of the relationship between the predictor and criterion, cases above 
and below the criterion median within each predictor category are not equiva- 
lent on predictor scores. Because of these differences, items or scores found 
to differentiate between comparison groups may merely reflect the predictor 


composite differences. 


Moderators developed through QA may therefore be 


predictive only of the composite predictor itself. 


The apparent ceiling in predictive validity, 
well documented by Ghiselli (1955), stimu- 
lated a search for new approaches to improve 
behavioral prediction. Primarily, this need was 
met with pleas for abandonment of the simple 
one-to-one _ predictor-criterion correlational 
model (see, e.g., Dunnette, 1963; Guetzkow 
& Forehand, 1961; Primoff, 1955). Perhaps 
the most intriguing, promising, and conveni- 
ent weapon in the struggle toward improving 
prediction was the concept of the moderator 
variable (Saunders, 1956). While several ap- 
proaches were offered (Banas, 1964; French, 
1961; Lykken, 1956; Marks, undated; Rim- 
land, 1960; Toops, 1959), Ghiselli’s (1956, 
1960) had the greatest appeal due to its 
directness of application and empirical sup- 
port. Briefly, this technique analyzes data to 
identify items or variables that correlate with 
absolute differences, Ds, between standard 
scores on the predictor and standard scores 
on the criterion. The appearance of this 
method resulted in a widespread resurgence 
of interest in moderating simple predictions. 
All at once it seemed that everyone analyzed 
his data for moderator variables, and it ap- 


1The opinions expressed are those of the author 
and do not necessarily reflect those of the Navy 
Department. The author is indebted to Marvin D. 
Dunnette, University of Minnesota, for his helpful 
suggestions on an earlier draft of this article. Re- 
quests for reprints should be sent to the author, 
United States Naval Research Activity, San Diego, 
California 92152. 
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peared from published reports that they were 
being found—but only as often as chance 
permitted. These discouraging results led 
many researchers to throw in the tear-stained 
moderator towel and search elsewhere for 
enhanced prediction. 

Unlike many less Spartan researchers, how- 
ever, Hobert (1965; Hobert & Dunnette, 
1967) remained undaunted when the D-score 
strategy failed to produce positive results. 
Instead, Hobert developed an alternative 
procedure, named quadrant analysis, for dis- 
covering correlates of individual differences in 
predictability. Using managerial performance 
as the criterion and a composite score as 
the predictor, he partitioned the predictor- 
criterion scatterplot as shown in Figure 1 into 
four quadrants by first cutting at the overall 
predictor median and then on the criterion 
score median. Next, since the relationship was 
positive (r= .70), he labeled the upper-left 
and lower-right quadrants as underpredicted 
and overpredicted, respectively. Analogously, 
the upper-right and lower-left quadrants were 
labeled as high hits and low hits, respectively. 

In attempting to uncover a moderator vari- 
able that would discriminate between the “low 
hits” and the “underpredicted” cases, Hobert 
contrasted the two groups’ item responses and 
test scores on predictors in the original pre- 
dictor set. Similarly, the high hits and over- 
predicted managers were compared on the 
same responses and scores. For each com- 
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parison, differences between the groups were 
combined into a ‘“‘moderator” that hopefully 
would provide enhanced predictions on future 
subjects. This procedure produced two moder- 
ators, one for low- and one for high-predictor 
cases. The moderators were cross-validated on 
hold-out samples and appeared to improve 
prediction. 

In comparison with other approaches, 
Hobert cited several potential advantages for 
identifying moderator variables in this man- 
ner. First, under- and overpredicted cases 
are not considered jointly as they are in the 
D-score approach. One moderator for over- 
predicted and one for underpredicted cases 
are constructed. Since failure is generally more 
predictable than success, it may be expected 
that differential validities would occur for 
high- and low-predictor cases, and quadrant 
analysis can take account of such differences. 
Thus, the technique does not involve dilution 
of possible differences between high- and low- 
predictor errors, unavoidable with most other 
approaches. Finally, from a practical stand- 
point, the four subgroups in quadrant analy- 
sis are obtained easily, requiring no D-score 
type computations. 

This procedure, quadrant analysis, while 
intuitively logical and appealing, incorporates 
a serious shortcoming that could easily affect 
the results. This shortcoming may best be 
understood by referring to the scatterplot in 
Figure 2. 

The shortcoming of this procedure is that 
mean differences on the predictor composite 


between comparison groups are ignored. For 
the low-predictor groups, that is, low hit and 
underpredicted, predictor means have been 
estimated by inspection and are indicated in 
Figure 2. Similarly, for high hits and over- 
predicted groups, approximate mean predictor 
scores are indicated. 

Obviously, there are sizable differences 
on the predictor-score means which are ig- 
nored when underpredicted and overpredicted 
groups are contrasted with high and low hits. 
Differences found in item analyses may 
merely reflect these predictor differences. In 
other words, the so-called moderator variable 
may only be predictive of differences on the 
predictor variable itself. At the very best, it 
appears that the components of the moderator 
variable would reflect a combination of dif- 
ferences, that is, actual differences between 
hits and misses and spurious differences due 
to predictor-score differences. 

Thus, quadrant analysis virtually insures 
finding differences between hits and mis- 
predicted cases. When items and variables 
involved in the original predictor set are 
searched for possible inclusion in the moder- 
ator scale, as in the Hobert study, it is further 
assured that differences will be found. And, 
the differences found will most likely be those 
already receiving the greatest weight in the 
original predictor composite. Since the vari- 
ables showing such differences will be so 
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highly related to the predictor, it is doubtful 
that such a moderator should be expected to 
have value beyond that already shown in the 
predictor-criterion relationship. The net effect 
of using data in the original predictor set 
as a moderator is equivalent to changing the 
weighting used to obtain the composite score. 
Hobert’s and Dunnette’s (1967) suggestion 
that moderators using items outside the origi- 
nal predictor set may be more efficient should 
be taken seriously in subsequent applications 
of the quadrant approach. However, this sug- 
gestion does not remove the problem imposed 
by the built-in composite-predictor score dif- 
ferences between comparison groups. Perhaps 
this difficulty could be overcome by matching 
comparison groups on the predictor-composite 
score. 

The problem of improving prediction with 
moderator variables is an interesting and im- 
portant one since in practical situations pre- 
dictive accuracy is rarely high. Hobert has 
clarified some of the disadvantages of existing 
moderator approaches and suggested an inter- 
esting alternative, but a solution to the 
methodological limitations of quadrant analy- 
sis is yet to be found. 
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MODERATION OF A MODERATOR TECHNIQUE 


QUINN McNEMAR 1? 


University of Texas at Austin 


A modification of a method proposed by Hobert and Dunnette (1967) for 
finding moderator variables is developed and then illustrated by way of an 
idealized set of data. It is shown that in terms of increased validity coefficients, 
decreased percentage of overlap, and increased hit rate the modification, which 
is simple and very easy to use, leads to marked improvement over the original 
Hobert and Dunnette method. Then the seriously misleading nature of this 
search for moderators, shared alike by the original and modified versions, is 


pointed out. 


The paper by Hobert and Dunnette (1967) 
reports the dissertation efforts of Hobert to 
use quadrant analysis, an operational method 
for finding moderator variables that emerged 
from Dunnette’s (1963, 1966) modification 
of a model proposed by Guetzkow and Fore- 
hand (1961) for test validation and/or pre- 
diction. There is no need to describe the 
model here, but a brief statement of quadrant 
analysis is required in order to understand a 
paradox that can arise from its usage. 

Imagine the bivariate scattergram for a 
criterion, Y, and a predictor, X. By cutting 
at, say, the medians the resulting four-fold 
table permits a classification of individuals 
into “high hits” (high predictor, high cri- 
terion, in upper-right quadrant), “low hits” 
(low predictor, low criterion, in lower-left 
quadrant), ‘“overpredicted” (high predictor, 
low criterion, in lower-right quadrant), and 
“underpredicted” (low predictor, high cri- 
terion, in upper-left quadrant). Now consider 
the two low-predictor groups, the low hits 
and the underpredicted. Obviously, they will 
differ rather markedly on the criterion, or Y, 
variable, whereas (it is asserted that) the 
two groups “have common predictor,” or X, 
scores. (By symmetry, similar characteriza- 
tion would hold for the two high-predictor 
groups.) If items or variables could be found 
that differentiate the low hits and under- 
predicted groups, one would have the where- 
withal for identifying those that would be 
underpredicted by X. Thus a “moderator” 
variable would have been found. 

1 Requests for reprints should be sent to the au- 


thor, Department of Psychology, University of Texas, 
Mezes Hall 211, Austin, Texas 78712. 
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By using this quadrant analysis, Hobert 
and Dunnette (1967) did find scales that 
when used as a moderator test eliminated 
25% as unpredictable and raised the validity 
from .65 to .73. This apparent success in the 
chase for moderator variables does, however, 
need further scrutiny before too much effort 
is spent thereon. 

In order to examine the situation, a two- 
way symmetric frequency table (Figure 1) 
was generated for NM =1,000, by use of 
Pearson’s tables, for a normal bivariate cor- 
relation of .70 which was chosen because of 
its nearness to the validities in the Hobert 
and Dunnette study. Actually, the calculated 
r for Figure 1 is .687, which deviates from 
.70 because of rounding of the frequencies, 
ignoring cases that deviate more than 2.7 
sigmas, and the grouping error (Sheppard’s 
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correction raises the 7 to .692). For. conve- 
nience, both X and Y are represented by 
scores ranging from 1 to 18, Ms=9.5, 
SDs = 3.296. 

It is relevant to report some statistics 
(actually parameters since one is dealing with 
a theoretic bivariate distribution). For all of 
the low-predictor cases (left half of scatter): 
M for X = 6.84; M for Y="7:68,) the SDs 
are 1.95 and 2.75, respectively. For the under- 
predicted cases (upper left), MZ for X = 7.87, 
M for Y=11.13; for the low hits (lower 
left), M for X = M for Y = 6.49. The large 
differences between the Y means would cer- 
tainly be anticipated, but these two sub- 
groups did not achieve exactly “similar 
predictor” (X) scores. The difference in 
means, 7.87—6.49, is sufficiently large that if 
based on samples even as small as 10 and 30 
(proportional to quadrant frequences in Fig- 
ure 1) significance would be claimed at the 
.0O1 level, two-tailed test. 

From the fact that the low hits and under- 
predicted subgroups differ not only on Y, the 
criterion, but also on the predictor (X), it 
follows that any items or variables that dif- 
ferentiate between these two groups may do 
so for either of two obvious reasons. This 
would seem to be advantageous in that dis- 
criminating items (and/or variables) for use 
in a moderator test need not be uncorrelated 
with the predictor variable, X. (If it were 
known that this type of search did lead to 
a moderator that is uncorrelated with X, it 
would of course be good strategy to combine 
it with X by multiple regression in order to 
enhance the prediction of Y.) 

Now Hobert and Dunnette searched for, and 
found, discriminating items (and scales) right 
in the pool of items (and scales) making up 
the predictor battery. Unweighted scoring of 
these items (and scales) produced an “item” 
and a “scale” moderator test. Both yielded 
correlations with the criterion, but the scale 
moderator held up better under  cross- 
validation. With a test that does “discrimi- 
nate between two subgroups of individuals 
both of whom attained similar scores on a 
test battery, but different criterion scores 
[p. 57],” their next step was to use it to 
identify those for whom predictions based 


on the test battery will be incorrect. Getting 
rid of these would leave those that are more 
predictable. The method of identification will 
be given below. 

This reuse of items and scales from the 
original battery, with no requirement that the 
moderator test thus built be uncorrelated with 
the predictor (i.e., the original battery), led 
the present author to hypothesize that the 
predictor itself would prove to be a moderator. 
If so, one could forget the search for items 
for the moderator test, and assuming cross- 
validation for the predictor one could also 
forget about cross-validation for the moder- 
ator (since there is no capitalization on 
chance at this stage). 

Refer again to Figure 1 in which it is 
readily seen that some correlation must hold 
for the 500 cases in the left half of the 
scattergram. Straightforward calculation leads 
to an r of .49 (a value which may be regarded 
as a parameter—not subject to sampling 
error). Within the half, the regression for 
criterion on the “moderator” is exactly linear, 
and the fact that the other regression is curvi- 
linear is of no relevance. Obviously the 
“moderator” discriminates between the under- 
predicted and the low-hits group, hence it 
can be used to identify some of those that 
would be unpredictable. (The low hits, as 
well as the high hits, are regarded as the 
predictables. ) 

The procedure used by Hobert and Dun- 
nette is next followed for ascertaining a 
cutting score on the “moderator” as a basis 
for eliminating (some) unpredictable cases. 
Because of space limitation, the reader will 
need to refer to Hobert and Dunnette for 
detail. Suffice it to say here that the method 
leads to a dichotomizing cutting score for 
optimal classification as to group member- 
ship. For the present example, the cutting 
score becomes 7.5; that is, those in the under- 
predicted and low-hits groups scoring 8 or 
9 are candidates for elimination as_ being 
unpredictable. 

What of the unpredictables among the 
high-predictor group? Hobert and Dunnette 
entertained the idea that items (and scales) 
that differentiate between the two low- 
predictor groups may not best differentiate 
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between the high-hits and _ overpredicted 
groups, hence they proceeded to find items 
(and scales) for a second moderator test. 
The present author hypothesized that his 
proposed “moderator,” the predictor variable, 
would do equally well in both situations. It 
does: r= .49 for right-half cases, and the 
cutting score is 11.5. Thus, those with scores of 
less than 11.5 (i.e., 11 and 10) become identi- 
fied as some of the “unpredictables” among 
the right-half cases. 

When one brings together the results of the 
proposed cutting scores, it is seen that from 
the entire group those with scores of 8, 9, 10, 
and 11 on the “moderator” will be eliminated, 
which is equivalent to sweeping out all the 
cases in the four middle vertical, shaded, 
arrays of the scattergram (Figure 1). (Recall 
that the “moderator” is identical with the 
predictor.) The percentage eliminated is 45.2, 
as compared to only 25% in the Hobert and 
Dunnette study. 

But the percentage eliminated is not the 
ultimate criterion for judging the worth of a 
moderator. What happens to predictive valid- 
ity when the moderator is used? This, ac- 
cording to Hobert and Dunnette, may be 
specified in terms of increased correlation, in- 
creased percentage of hits, and decreased 
percentage of overlap. From the distributions 
on the criterion variable for the two groups, 
the low predicted and high predicted after 
elimination by way of the moderator tests, 
they computed a point-biserial 7 as .73, 
whereas the point-biserial for the entire group 
was .65. For the present “data,” it is found 
that the point-biserial is increased from .55 
to .72. As for hits, Hobert and Dunnette 
found an increase of either 4% or 8% (a 
complication here—perhaps 6% is a good 
estimate); here an increase of 11% was ob- 
tained. For the present example the Tilton 
(1937) percentage of overlap was reduced 
from 50 to 30, as compared to a drop from 
38 to 28 found by Hobert and Dunnette. 

The evident fact that use of the predictor 
itself as a “moderator” is better than using 
items or scales selected from a predictor 
(battery) may or may not come as a surprise. 
Obviously, less effort is involved: there is 
no item analysis or search for scales, with 


subsequent need for separate cross-validation 
of the resultant moderator test. However, 
cross-validation of the predictor battery may 
still be required, with the cutting scores on 
the predictor as “moderator” based on the 
cross-validation group. 

The present author’s proposal would appear 
to be a simple breakthrough in methodology 
of prediction unless one raises the simple 
question, Do such moderator variables in- 
crease accuracy? From the foregoing, the 
answer seems to be a strong “yes,” but one 
should take a further look. A count of the 
frequencies in the two hits quadrants of 
Figure 1 indicates that for all cases the per- 
centage of hits is 74.4 as compared to 85.4 
for those 548 remaining after use of the 
“moderator” to eliminate some of the un- 
predictables. Now of these 548, the total 
number of hits that can be claimed is 468, or 
a mere 46.8% of the original starting group. 
Instead of a gain of 11% there is a loss of 
27.6% in the hit rate! It is estimated that 
Hobert and Dunnette’s gain of about 6% is 
actually a loss of about 15%. 

What is going on here is obvious. One can 
always pull out cases from the central portion 
of the distribution on the predictor variable. 
If one dichotomizes on the basis of the pre- 
dictor both before and after sweeping out 
such cases, one can expect in general that 
biserial 7 will increase, the percentage of 
overlap will decrease, and if the criterion is 
also dichotomized the percentage of hits will 
increase. Or if one dislikes information- 
wasting dichotomizing, one might calculate 
the product-moment coefficients. For the 
present example, 7 goes up from .687 to .780. 
But, sad to relate, the error of estimate re- 
mains unchanged! This can be seen in Fig- 
ure 1 by examining the spread of scores 
within the vertical arrays about the regres- 
sion line. Neither does the gain in validity 
claimed by Hobert and Dunnette represent a 
reduction in the error of estimate despite the 
increase in correlation. 

Anyone who maintains that the apparent 
gain in hit rate should not be converted to 
a loss by basing the percentages on the total 
N should keep in mind that to have the gain 
one must decline to make predictions for 25% 
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(Hobert & Dunnette data) or 45% (present 
“data’’) of the cases. If the boss will stand for 
this, maybe he would allow the applied psy- 
chologist to attain a 100% hit rate by the 
simple expedient of refusing to make predic- 
tions for 93.2% of the cases when the correla- 
tion between predictor and criterion is .70. 
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AND JOB-ORIENTED VERBS* 
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The purpose of the present study was to determine if a worker-oriented vs. 
job-oriented (orientation) continuum was unidimensional. Previous research 
indicated that verbs from job descriptions could be scaled along the orienta- 
tion continuum. The multidimensional method of successive intervals was 
applied to a sample of 20 previously scaled verbs. From the responses of 50 
college students instructed to consider the verbs in terms of jobs five orthogonal 
dimensions were obtained. It was concluded that the orientation continuum 
existed, but that it was too complex to be considered unidimensional. Research 
based on actual observed job behavior is needed to establish the generality of 
the orientation continuum and to determine if it is an adequate construct on 


which to base indirect validity. 


McCormick (1964) has attempted to de- 
velop an approach to job analysis that would 
be broadly based, objective, and quantitative. 
Such an approach could potentially fulfill the 
demands implied by the concept of synthetic 
validity (Balma, 1959; Lawshe, 1952; Mc- 
Cormick, 1959). The general idea may also be 
relevant to the establishment of task taxono- 
mies (Fitts, 1964; Gagné, 1962; Guion, 
1965). Much research is necessary, however, 
before a strategy such as that of McCormick 
and his associates (McCormick, Cunningham, 
& Gordon, 1967; McCormick, Cunningham, 
& Thornton, 1967; Peters & McCormick, 
1966) becomes operational. 

One way to make the results of job analyses 
widely applicable and comparable across jobs 
and industries would be to discover common 
elements or common denominators among 
jobs. McCormick (1959) has referred to the 
end result of such a process as indirect valid- 
ity. Such terms as generalized validity and 
synthetic validity (Balma, 1959; Lawshe, 
1952) also incorporate the conceptual strategy 


1 This article is based on a master’s thesis com- 
pleted at the University of Maryland under the di- 
rection of C. J. Bartlett while the author was a 
National Aeronautics and Space Administration Fel- 
low—Fellowship No. NsG(T)-3, #4. Computer time 
was made available through the facilities of the 
Computer Science Center of the University of Mary- 
land. 

2 Requests for reprints should be sent to the au- 
thor, Department of Psychology, Morrill Hall, Uni- 
versity of Maryland, College Park, Maryland 20742. 
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used here of applying validity established in 
one situation to another. 

Central to McCormick’s approach is the 
dimension of worker-oriented as contrasted 
with job-oriented work elements. Job-oriented 
elements describe what is accomplished by the 
worker, refer to technological aspects of the 
job, and include knowledge of the job. Un- 
like worker-oriented elements, job-oriented 
elements tend to be specific to a job or a 
limited class of jobs. 

Worker-oriented elements describe in be- 
havioral terms what the worker does to ac- 
complish his objective. Since worker-oriented 
elements involve commonly observed human 
behaviors, they are potentially applicable to 
an unlimited range of jobs and may be use- 
ful as common denominators among jobs. A 
worker-oriented element, using a verb as an 
example, is LISTENING which might be equally 
appropriate to describe an element of behav- 
ior displayed by such diverse workers as a 
sonar operator, a piano tuner, or a music 
critic (Gordon & McCormick, 1962). 

The present research was suggested by the 
results of a study related to McCormick’s 
approach to job analysis and, more specifi- 
cally, to the semantic aspects of the worker- 
oriented versus job-oriented (orientation) 
continuum (Gordon, 1961; Gordon & Mc- 
Cormick, 1962). 

The main purpose of the original research 
by Gordon and McCormick (1962) was to 
determine if work-related verbs could be 
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differentiated along the orientation continuum 
in a meaningful way. Thus, the original study 
was primarily intended to contribute to the 
justification of the use of the continuum with 
the more complex stimuli studied in subse- 
quent research on job analysis (McCormick, 
1964). 

Gordon and McCormick (1962) obtained a 
sample of 1,000 verbs from job descriptions. 
Through sorting and rating procedures, 
judges eliminated 700 verbs as inapplicable to 
the continuum or strictly job oriented. The 
300 remaining verbs were rated on a 6-point 
scale which varied from worker oriented to 
job oriented. An additional category was pro- 
vided so that judges could indicate that a 
verb was ambiguous or fell outside the con- 
tinuum. The investigators (Gordon & Mc- 
Cormick, 1962; McCormick, 1964) concluded 
on the basis of the verb study that: (a) The 
distinction between job-oriented and worker- 
oriented verbs can be made by both naive and 
skilled judges with acceptable reliability es- 
pecially when judgments of a number of 
judges are pooled; (b) verbs related to hu- 
man work activities range over a continuum 
as far as their connotations of worker oriented 
versus job oriented are concerned; and (c) 
while the distinction between worker-oriented 
and job-oriented verbs is not definite, the dif- 
ference is sufficient to be useful in describing 
human work in worker-oriented terms. 

The procedure of rating the meaning of 
verbs in terms of the orientation continuum 
assumed that the meaning of the verbs was 
unidimensional. That assumption overlooked 


TABLE 1 











VERBS SELECTED ror Use 1N tHe Present Strupy 
Job-oriented Worker-oriented Ambiguous 
verbs verbs verbs* 

BURNS CONCENTRATES SCRUTINIZES 
CURDLES READS INTERPRETS 
COAGULATES TOUCHES DEPICTS 
CARBONIZES LOOKS IMAGINES 
RECEIVES PEELS AWAKENS 
ADHERES SEES 
AGITATES 
ANCHORS 


IMMERSES 











* At least 25% of the judges placed these five worker- 
oriented verbs in an ‘‘ambiguous" category. 


the possibility that the meaning of the verbs 
with respect to their relevance to jobs could 
be accounted for by more than one dimen- 
sion. This possibility was supported by the 
fact that about 20% of the worker-oriented 
verbs were rated ambiguous by many judges 
and also by the large dispersion of the rat- 
ings of many of the verbs. Further support 
for the multidimensional nature of words in 
general comes from the studies performed by 
Deese (1965), Reeb (1959), and Root 
(1962). Research by Osgood and his associ- 
ates (Osgood, Suci, & Tannenbaum, 1957) 
and Andrews and Ray (1957) supports the 
idea that the judgment of words is usually 
multidimensional. 

For these reasons some approach for de- 
termining the dimensionality of a sample of 
the verbs used in the original verb study 
(Gordon & McCormick, 1962) seemed justi- 
fied. The multidimensional method of suc- 
cessive intervals appeared to be appropriate 
for this purpose. Discussion of multidimen- 
sional scaling procedures, such as those used 
in the present research, has been presented by 
Torgerson (1958) and Brown (1967) and 
will not be repeated here. 

The purpose of the present study was to 
determine whether or not the worker-oriented 
versus job-oriented continuum could be con- 
sidered unidimensional. Accordingly, the re- 
search hypothesis stated that a multidimen- 
sional scaling analysis of a sample of verbs 
previously scaled along the orientation con- 
tinuum would identify three types of dimen- 
sions: (@) one worker-oriented versus job- 
oriented dimension, () dimensions logically 
related, but orthogonal to the orientation di- 
mension, such as a dimension incorporating 
verbs rated ambiguous in the previous study, 
and (c) orthogonal dimensions not directly 
related to the orientation dimension such as 
the relative generality or frequency of the 
verbs. 


MerrHop 


Stimuli were selected from a list of 300 verbs 
reported by Gordon and McCormick (1962). Verbs 
from the extreme ends of the worker-oriented versus 
job-oriented continuum were selected. Because of 
differences in frequency count based on the Thorn- 
dike-Lorge “G” count (Thorndike & Lorge, 1944), 
worker-oriented verbs with the lowest frequency and 
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job-oriented verbs with the highest frequency were 
used. Even so, the worker-oriented verbs tended to 
have higher frequency counts. Where a choice was 
possible, verbs with smaller dispersions on the orien- 
tation continuum were selected. The 20 verbs used in 
the present study are listed in Table 1. 

The 100 male students who participated in the 
study were fulfilling a course requirement in an 
introductory psychology course at the University of 
Maryland. The subjects (Ss) were randomly divided 
into two groups. Both Group A (NV = 50) and Group 
B (N=50) performed the multidimensional scaling 
task and then rated the verbs in terms of generality. 
The 5-point rating scale for generality varied from 
general to specific with only the end points identi- 
fied, 

For both tasks Ss were given booklets with writ- 
ten instructions and asked to work rapidly, but 
carefully, within a 1-hr. time limit. Effects due to 
the order of the stimuli in the scaling task were 
minimized by following a procedure proposed by 
Ross (1934, 1939). Further precautions were taken 
to control the position of the stimuli on a page and 
the location of pages within the booklets. The scaling 
task was always performed before the verbs were 
rated on generality. 

The first task for Group A was to rate 190 pairs 
of words constructed from 20 separate verbs on a 
7-point scale as a part of the multidimensional scal- 
ing procedure. Ratings were made on the basis of 
similarity as perceived by the rater. No specific con- 
text or set was introduced. The Ss were familiarized 
with the verbs by requiring them to select pairs of 
verbs to serve as anchors for the extreme ends of 
the similarity scale. Following the successive-intervals 
task, Ss were requested to rate the 20 verbs on the 
generality scale. 

The instructions used with Group A were supple- 
mented in Group B by instructing Ss to consider 
the stimulus pairs in relation to actions performed 
in “jobs in general.” First, Ss were required to rate 
190 pairs of verbs on a 7-point scale on the basis of 
similarity. The importance of the job context was 
stressed in the instructions. Then, Ss rated 20 verbs 
on the 5-point general versus specific scale as in 
Group A. 


RESULTS 


The raw data for the multidimensional 
scaling procedure were obtained by requiring 
each § to rate 190 stimulus pairs along a 7- 
point scale of similarity. A computer program 
incorporating the Messick and Abelson 
(1956) additive constant solution was em- 
ployed to analyze the scaling data. A princi- 
pal-factor solution and a Varimax rotation 
were used in the solutions for both Groups A 
and B (Harman, 1960; Kaiser, 1958). 

For Group B, given job-context instruc- 
tions, a break in the eigenvalues was used to 


TABLE 2 


ROTATED Factor Matrix FOR THE MULTI- 
DIMENSIONAL SCALING ANALYSIS 
IN Group A 














Factor 
Stimulus 
I II III IV V 
SCRUTINIZES .76 AP ee 39 AS. 10 
CARBONIZES — .16 -02 54 | —1.97 00 
CONCENTRATES 88 | — .03 | — .08 -.09 | — .05 
RECEIVES — .01 — 30] — A7 -16 03 
INTERPRETS 44} — .06| — .66 36 63 
ADHERES — .34 | —1.33 -06 07% | == .09 
READS 1.18 OUP — 9:23 AT 27 
CURDLES — .20] — .19 1.77 | — .42|] — .21 
DEPICTS .20 -20 | — .36 Al 68 
AGITATES — .94 .88 86 | — 19} — .11 
IMAGINES -O1 38 | — .22 -96 51 
COAGULATES — .02 | — .86 1.24 | — 44; — .O1 
TOUCHES =— 49 | — 36] — 35 | — .22 | — .04 
BURNS —1.11 55 14) —1.72 | — .05 
LOOKS 55 210.50) .50 AT 
ANCHORS — .28 | —1.30| — .11 — .04 | —1.39 
FEELS — .70 17 | — 44 .24 38 
AWAKENS — A7 1.02) | = .26 76 | — .04 
IMMERSES .28 Pol OL — .06 | —1.46 
SEES 44 -14 — .52 -92 38 








Note.—Instructions to S did not specify a job context. 


determine the number of dimensions to re- 
tain. Then, the same number of dimensions 
was obtained for Group A. In order to justify 
this procedure, solutions with varying num- 
bers of dimensions were obtained for Group 
A. The procedure used to select the number 
of dimensions to retain in Group A led to only 
minor variations in the configurations of 
factor loadings. 

Five dimensions were employed to analyze 
the scaling results for Groups A and B. Based 
on the eigenvalues and communalities used in 
the principal-axis solution with no rotations, 
five dimensions accounted for 48% of the 
common variance in Group A and 55% in 
Group B. The rotated factor matrices using 
five-dimensional solutions for Groups A and 
B are presented in Tables 2 and 3. 

The interpretation of dimensions was lim- 
ited to results from Group B (see Table 3), 
since that group was given instructions in- 
tended to induce a “‘job-context”’ set. Since an 
orthogonal rotation to a Varimax criterion 
was used, all of the dimensions were orthog- 
onal. The factors were interpreted and named 
as follows: 

Dimension I—Type of Worker Involvement. 
This dimension appeared similar to McCor- 
mick’s orientation continuum. Verbs at the 
positive pole were from the worker-oriented 
(see Table 1) and ambiguous categories. The 
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TABLE 3 


Rotatep Factor MATRIX ror tHE MULTI- 
DIMENSIONAL SCALING ANALYSIS 
IN Group B 





Factor 











Stimulus ——---—— — _ aie 
I Il IIT IV Vv 

SCRUTINIZES mh 39 ia — .55 .09 
CARBONIZES —1.96 |} — .19 mol 00 | — .04 
CONCENTRATES 80 | — .18 63 — .14 .16 
RECEKIVES 10] — 43 — .71 — .08 — .20 
INTERPRETS 82 | — .20} — 13 | — 58 | — .50 
ADHERES 02 | —1.34 — .16 .70 16 
READS 94 — .07 09 — .53 — .09 
CURDLES — .67 .09 es) 1.33 | — .18 
DEPICTS 26) — 10] — .06} —1.00 | — .65 
AGITATES — .79 1.28 | — .06 56 11 
IMAGINES 31 fo 14] — .98 — .22 
COAGULATES — 46| — .61 1.19 90} — .32 
TOUCHES — .19 — 04} —1.01 Ort — .03 
BURNS —1.82 58 | — 18 .29 24 
LOOKS 83 55 | — .06}] — .31 — .25 
ANCHORS 10) —1.22) — 12 100 1.26 
FEELS 00 13 — .78 — 08 | — .09 
AWAKENS 7 1.04 | — .67 18 | — 25 
IMMERSES — .23 — .08 25 18 1.30 
SEES 79 24) — 18 | — 49) — .50 


Note.—TInstructions to S specified a job context. 


verbs were READS, LOOKS, INTERPRETS, CON- 
CENTRATES, and seEs. Words at the opposite 
pole were from the job-oriented category. 
These included CARBONIZES, BURNS, AGITATES, 
and CURDLES. 

Dimension II—Actions Related to Motion. 
All of the words with high loadings on this 
dimension except AWAKENS were job-oriented 
verbs. The dimension contrasted AcITATES and 
AWAKENS with ADHERES, ANCHORS, and COAG- 
ULATES. 

Dimension III—Degree of Personal In- 
volvement in Actions. When considered as a 
bipolar continuum, the dimension related to 
actions involving people or personal contact 
as opposed to actions involving impersonal 
manufacturing processes. Verbs with negative 
loadings were TOUCHES, FEELS, RECEIVES, and 
AWAKENS; verbs with positive loadings were 
COAGULATES, CURDLES, and CONCENTRATES. 
Note that RECEIvEs occurred with the worker- 
oriented rather than the job-oriented verbs as 
expected. The same type of reversal occurred 
with CONCENTRATES, a worker-oriented verb. 

Dimension IV—Degree of Mental Activ- 
ity. The negative pole had high loadings on 
DEPICTS and IMAGINES, while on the positive 
pole CURDLES, COAGULATES, and ADHERES were 
found. In contrast to Dimension I, where both 
worker-oriented and ambiguous verbs ap- 


peared at the same pole, this dimension had 
only ambiguous verbs at one pole. 

Dimension V—Marine Activities. Dimen- 
sion V had high loadings on IMMERSES and 
AncHorS. At the opposite pole, peprcts had 
a moderate loading. Unlike the other dimen- 
sions, Dimension V seemed to be unipolar. 

The reliability of the multidimensional scal- 
ing data was evaluated with a variation of 
the split-half technique. Groups A and B 
were randomly divided into two subgroups of 
25 Ss each. For both subgroups, the median 
similarity rating given each verb pair was 
determined. Then, the reliability estimate was 
obtained by correlating the median item rat- 
ings for the sample halves and making the 
usual split-half correction for length. This 
procedure was repeated 20 times for Groups 
A and B to get mean values. 

The introduction of the job-context instruc- 
tions resulted in increased average reliability 
of the similarity ratings from .55 in Group A 
(no job context) to .73 in Group B (job con- 
text). On the basis of this increase in relia- 
bility, it was inferred that the job-context 
instructions actually affected the ratings in 
the intended direction. 

Rank correlations were calculated between 
the multidimensional scaling dimensions and 
the generality ratings obtained from Groups 
A and B. Dimensions III and IV in Group A 
had correlations exceeding .55, and Dimen- 
sion III in Group B had correlations exceed- 
ing .60. It is of particular interest that neither 
Dimension I nor IV in Group B was highly 
correlated with the generality ratings. Thus, 
the dimensions most similar to the orientation 
continuum could not be explained in terms of 
the generality or specificity of the verbs. 

Ratings of perceived frequency of occur- 
rence of the same 20 verbs were obtained in 
independent groups similar to Groups A and 
B. The intercorrelations of frequency with all 
the dimensions were almost identical with 
those of the generality ratings. Frequency 
ratings correlated greater than .85 with the 
generality ratings, indicating little difference 
in these two aspects of the sample of verbs. 

In contrast to Group B, instructions to Ss 
in Group A did not mention a job context. 
The Ss approached the ratings of the verb 
pairs on similarity with whatever set might 
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have seemed appropriate to them. The rea- 
sons for obtaining data from Group A were 
to: (a) determine if the job-context instruc- 
tions in Group B had an effect and (0) use 
any differences in the factors from Groups A 
and B to assist in interpreting the results 
from Group B. 

A comparison between the factor loadings 
given in Tables 2 and 3 for Groups A and B 
indicated that: (a) In terms of the verbs with 
the highest loadings, Dimensions II and V 
were very similar in the two groups; and (0) 
noticeable differences in a few factor loadings 
appeared with Dimensions I, ITI, and IV. 

The interrelationships among the dimen- 
sions obtained in the solutions for Groups A 
and B were complex, as indicated by Table 4. 
The factors that appeared to be most closely 
related on the basis of both configuration of 
loadings and rank correlations between factor 
loadings appear in the same row of that table. 
Dimension III in Group B had relatively 
weak relationships with the other dimensions, 
and no corresponding dimension in Group A 
is given in Table 4. 


DISCUSSION 


The first part of the research hypothesis 
was that a bipolar worker-oriented versus job- 
oriented dimension would be found. Dimen- 
sion I (Type of Worker Involvement) in 
Group B was found to be similar to that con- 
tinuum and confirmed the hypothesis. 

Dimension IV (Degree of Mental Activity) 
from Group B was relevant to the second part 
of the research hypothesis. Dimension IV in 
Group B had high loadings at one pole on 
verbs referring to mental activities and desig- 
nated ambiguous in the Gordon and McCor- 
mick (1962) study. It had loadings on job- 
oriented verbs at the opposite pole and had 
a configuration of stimuli similar to that of 
Dimension I. The rank correlation between 
the factor loadings on Dimensions J and IV 
was —.70. The relationship found between 
these two independent dimensions was inter- 
preted to mean that the orientation continuum 
was too complex to be represented by a single 
dimension. 

The third part of the hypothesis predicted 
that the generality or frequency of the verbs 
would lead to additional dimensions. In Group 


TABLE 4 


COMPARISON OF DIMENSIONS FROM GROUPS 
A AND B 





Corresponding dimension 





Correlation 
Group B Group A 

I 14 

I Type of Worker Involve- Tit —./1 

ment IV 83 

II Actions Related to Motion It 85 
III Degree of Personal In- 

volvement in Actions 

I —.66 

IV Degree of Mental Activity IIt 3? 

IV —.72 

W. —.83 

V Marine Activities V —.17 





Note.—All correlations were Spearman rank correlation 
coefficients based on N = 20. 


B, only Dimension III (Degree of Personal 
Involvement in Actions) was related to these 
aspects of the verbs. Here the relationship ap- 
peared to be moderate as evidenced by a rank 
correlation in the .60s. 

The concepts underlying Dimensions II 
(Actions Related to Motion) and V (Marine 
Activities) for Group B seemed to be different 
from those represented by the other dimen- 
sions and were not accounted for by the re- 
search hypothesis. 

There was a tendency for worker-oriented 
and job-oriented verbs to fall together at op- 
posite poles of the various dimensions. Ex- 
ceptions, however, were noted with Dimen- 
sions IJ and III. 

The relatively low amount of common 
factor variance accounted for by five dimen- 
sions and the moderate reliability (.73) of the 
similarity ratings in Group B indicated that 
the distinction between worker-oriented and 
job-oriented verbs is not strong. Such a con- 
clusion is in agreement with the earlier re- 
search by Gordon and McCormick (1962). 

The results of the present study support 
McCormick’s (1959) proposal that a worker- 
oriented versus job-oriented continuum exists 
and that verbs can be scaled in terms of the 
continuum (Gordon & McCormick, 1962). A 
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new finding was that the perceptions of Ss as 
measured by the scaling technique involved 
five dimensions of which two appeared to be 
generally descriptive of the verb sample. Such 
an assertion was supported by the finding of 
a single dimension similar to the orientation 
continuum, plus the finding of a dimension 
with high loadings on verbs rated ambiguous 
in the previous study. Future research may 
show that the additional dimensions will vary 
with the context, Ss, and stimulus sample. The 
demonstration, however, that more than one 
dimension appeared in the analysis offered 
sufficient evidence that the worker-oriented 
versus job-oriented continuum is not unidi- 
mensional when verbs are used as stimuli. 
Thus, while the worker-oriented versus job- 
oriented dimension exists and it is legitimate 
to scale such stimuli as verbs along it, the 
continuum is much more complicated than 
would be desirable. 

A practical problem encountered while se- 
lecting worker-oriented verbs for the present 
study was the lack of a large number of such 
verbs with small dispersions on the continuum. 
The scarcity of a large number of worker- 
oriented elements with which to describe 
worker activities would seriously impair the 
usefulness of the worker-oriented concept as 
a basis for indirect validity in applied situa- 
tions. 

If a consistent theoretical and empirical 
basis for common denominators applicable to 
work behaviors or jobs is to be established, 
the basic units should be widely applicable, 
unambiguous, relatively free of extraneous in- 
fluences, and, ideally, unidimensional. The 
present study raises doubt as to the adequacy 
of the worker-oriented versus job-oriented ele- 
ments concept to serve as the common ele- 
ments upon which to base and infer indirect 
or synthetic validity. Research using job- 
analysis data based on actual observed job 
behaviors in a variety of contexts is needed 
to establish the generality and value of the 
orientation continuum. Neither the present 
research nor the extensive research by Mc- 
Cormick and his associates is sufficient to 
establish the usefulness of the worker-oriented 
versus job-oriented continuum. 
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FATIGUE AND PERFORMANCE VARIABILITY 
AMONG TYPISTS 


LEONARD J. WEST t 
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Investigation of typewriting fatigue and consistency of performance as a 
function of level of typing skill employed 234 Ss at typing-skill levels from 5 
through 108 wpm. The Ss typed ordinary prose for 30 uninterrupted min. in 
a manner permitting scoring of the work for speed and number of errors in each 
minute individually and cumulatively. Although significant fluctuations from a 
constant level of performance were found at all skill levels, the extent of 
speed, but not error, fluctuations differed among the skill levels. Principally, 
decrements in quality of work as the work period progressed decreased with 
increase in skill level, while correlations between trials and speed for the 
various skill levels did not exhibit any regular trend. However, absolute 
decrements in performance during the work period were judged too small to 
be of any practical consequence and call into question the conventional practice 
of confining practice and test durations to 1-3 min. during the early months of 
typewriting training. Relative consistency of speed scores was found to increase 
with level of skill; however, except for those at the lowest skill levels (who 
were most inconsistent), relative stability of error scores decreased with in- 
crease in skill. Finally, while a 1-min. sample of performance furnishes a 
highly reliable measure of speed, error measures require at least a 5-min. 


sample for adequate reliability, 


Among the hitherto unanswered questions 
about work fatigue for a “light, sedentary 
task” like typewriting are whether (a) in- 
creased resistance to fatigue accompanies in- 
creases in skill, and (b) increases in skill are 
accompanied by increased stability of per- 
formance across segments of a long, continu- 
ous work period. The findings on these ques- 
tions are intended to contribute to a more 
complete picture of fatigue phenomena, while 
findings on the first question have a clear 
bearing on training practices for typists. 
These practices appear to be predicated on 
the supposition that typewriting is a fatiguing 
task—as inferred from the very short work 
periods characteristic of early months of train- 
ing. The tacit assumption seems to be that 
endurance is a function of high skill and is 
achieved through a gradual approach to 
longer durations of continuous work. 


1'This investigation was begun at Southern Illinois 
University and completed at the City University of 
New York. The author wishes to thank Barbara 
Heller for supervising the scoring of the 350,000 
words typed by the subjects of this study. Requests 
for reprints should be sent to the author, Division 
of Teacher Education, City University of New York, 
33 West 42 Street, New York, New York 10036. 
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These conventional suppositions are some- 
what called into question by existing generali- 
zations about fatigue. Ryan (1947), among 
others, has pointed to negligible fatigue effects 
in “light, sedentary tasks,” with those effects 
being mostly on quality rather than on 
quantity of work. Chapanis, Garner, and 
Morgan (1949) have explained that per- 
formance is maintained under fatigue through 
increased expenditure of energy and increased 
motivation. In any event, fatigue effects vary 
with the task. 

Concerning typewriting fatigue, Enneis 
(1956) found no performance differences 
among employed typists for various typing 
tasks hypothesized to differ in effortfulness 
or between manual and electric machine oper- 
ators. Morgan (1954) found no pulse-rate 
differences for various typing tasks and vari- 
ous durations of work. Atwood (1964) used 
slide-camera equipment to photograph the 3- 
and 10-min. efforts of 30 first-semester typists, 
finding a drop of 2 wpm after the first minute, 
but trivial and nonsignificant fluctuations in 
speed and errors thereafter. Gilmer’s (1967) 
findings for 5- and 10-min. work intervals of 
60 first-year typists showed less fluctuation; 
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an initial decline in speed, followed by an end 
spurt and a gradual increase in errors through- 
out each of the two work periods. These pat- 
terns were found to hold generally for the 
various skill levels represented among his Ss. 

These earlier investigations. employed 
rather modest durations of continuous work 
and used Ss at one or another particular level 
of skill. Accordingly, the primary purpose of 
the present investigation was to assess typing 
fatigue over a long (30-min.), continuous 
work period, as a function of level of skill 
from novice through expert. Do performance 
decrements (in speed and quality of work) 
accompany a long, continuous work period at 
the typewriter, and do these decrements, if 
any, vary with skill level? A second purpose 
of the present investigation was to assess 
variability or consistency of performance as 
a function of skill level. Is increasing stability 
of performance a concomitant of increasing 
skill? A third purpose of the present investi- 
gation, as a by-product of inquiry into fatigue 
and variability phenomena, was to obtain 
data on the reliability of measures of typing 
performance of various lengths. Earlier reli- 
ability data were confined to studies using 
different Ss for tests of various lengths. For 
ordinary copy work at the typewriter, speed 
reliabilities under test-retest or parallel-form 
conditions have typically been in the .80s and 
.90s for measures as short as 1 min. Error 
reliabilities, on the other hand, have typi- 
cally been in the .30s—.40s, even for 5- and 
10-min. measures (e.g., Martin, 1954; West, 
1956), and only occasionally in the .70s 
(West & Bolanovich, 1963). The question 
here is one of optimum length of a single ad- 
ministration: what are the correlations be- 
tween various cumulative segments of the 
work period and the full 30-min, work period? 


METHOD 
Subjects 


Of a total N of 234 typists, ranging in skill from 
5 through 108 wpm, 183 were students in 15 dif- 
ferent high school and college typing classes at 
various stages of training in eight different schools. 
The remaining 51 persons (at the higher levels of 
skill) were mainly employed typists, but included a 
few typewriting teachers and several of the finalists 
in a national contest for high school typing cham- 
pions. The relevant population is one of levels of 
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typing skill (as measured by gross stroking speed 
during the 30 continuous min. of work of the present 
investigation), and the sample data are in terms of 
skill level, not stage of training or amount of work 
experience. 


Test Content and Procedures 


The 30 min. of continuous typing involved line- 
for-line copying of 2,625 words of continuous printed 
prose, composed so as to be of uniform difficulty at 
a syllabic intensity (mean number of syllables per 
word) of 1.40 (representing the conventional esti- 
mate of average difficulty of copy materials for 
vocational typists). Illustratively, the title of this 
article contains 6 dictionary words and 16 syllables, 
for a syllabic intensity of 16/6 or 2.67. The ma- 
terials were printed in triple columns on both sides 
of an 83X14 in. sheet, long edge horizontal. Ac- 
cordingly, there was no interruption for turning the 
copy materials by those who typed less than 43 wpm, 
one turn for those who typed between 44 and 87 
wpm, and only two turns for still faster typists. To 
preclude interruptions during the work session for 
changing paper in the machine, paper of sufficient 
length was cut from teletype rolls. 

The Ss were instructed to aim for optimum overall 
performance, taking both speed and accuracy into 
account. To permit identification (by the investi- 
gator, who administered the testing to all Ss) of 
each minute of the 30 continuous min. of work 
without interrupting the typist, the work was done 
in single spacing, with a double throw of the type- 
writer carriage upon loud announcement (“throw- 
throw”) by the investigator at the end of each 
minute. Upon the announcement, Ss were instructed 
to double space instantly and, without further signal, 
to continue to type with the next line of the printed 
copy. That tactic substituted a voice stimulus for the 
usual carriage-throwing stimulus (of ringing bell or 
visual perception of line end in the copy materials) 
and was followed, as are the conventional stimuli, by 
immediate carriage throwing and immediate resump- 
tion of typing. The result was 30 sets of single- 
spaced lines, each set separated from the next by a 
blank line. To avoid overestimating the work of the 
first minute and over- or underestimating the work 
of the last minute of the 30, the first “throw-throw” 
(signaling the beginning of the first scorable minute) 
followed 5-15 sec. (half a line) of prior unscored 
copying. Similarly, the final minute was followed by 
“throw-throw” and a few seconds of unscored work 
in a thirty-first minute. In this fashion, individual 
reaction times to conventional starting and stopping 
signals were better controlled. 


Data and Analysis 


Raw data consisted of number of strokes and 
number of errors per minute, individually and 
cumulatively. An error was defined as any discrep- 
ancy from perfection, but followed the conventional 
practice of counting no more than one error per 
word, regardless of number of misstrokes in the 
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word. (For descriptive purposes here, speed is re- 
ported in words per minute, conventionally count- 
ing each five typewriter strokes as one word.) The 
234 Ss were classified, on the basis of gross stroking 
speed for the 30 min., into ten 10-wpm skill levels 
(5-14, 15-24, ... 85-94, 95-108 wpm) and into 
four 25-wpm skill levels (5-24, 25-49, 50-74, 75- 
108 wpm). For purposes of statistical analysis, equal- 
frequency samples of N = 160 (40 from each of the 
broader ranges) and NV =171 (19 from each of nine 
narrower skill levels, combining 85-108 wpm into one 
cell) were drawn at random from the total pool of 
234 Ss. 

On questions of fatigue, performance scores (words 
per minute and number of errors) were subjected to 
Levels X Trials analyses of variance, for 9 (and 4) 
skill levels for 30 (and 6) trials; that is, 30 individual 
min. and 6 blocks of successive 5-min. segments of 
the total 30 min. Interlevel effects were estimated 
via correlations between trials (30 1-min. and six 
5-min. blocks) and mean performance scores for Ss 
in each of the four 25-wpm skill levels. The conven- 
tional assumption that endurance or resistance to 
fatigue is a function of increasing skill would lead to 
the expectation of positive, but decreasing, correla- 
tions between trials and mean number of errors 
and negative (and decreasing) correlations between 
trials and mean speed scores, with increase in skill 
level. 

On questions of consistency of performance, V, 
the coefficient of variation, was computed for speed 
and for number of errors for each S, and these Vs 
were subjected to one-way analyses of variance (by 
skill level). Although the underlying distribution of 


TABLE 1 


MEANS AND STANDARD DEVIATIONS FOR SPEED AND 
Errors IN 30 Min. oF Continuous Typrnec, 
BY SKILL LEVEL 


Skill level Words BEE Total errors 
(in words 
per min.) 
SD, Lames 
S-14 64.98 | 27.39 
15-24 99.12 | 36.44 
25-34 150.50 | 72.20 
35-44 149.84 | 75.35 
45-54 115.86 | 34.44 
55-64 128.07 | 74.52 
65-74 132.14 | 83.21 
75-84 109.95 | 50.52 
85-94 101.77 | 45.71 
95-108 102.00 | 43.79 
Total 
Grand Ms and 
SDs* 118.15 | 64.87 





« These are based on a randomly selected 19 persons from 
each of nine skill levels, combining 85-108 wpm into one cell, 
that is, on distributions of speed and of error scoresfor N = 171, 
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Fic. 1. Mean words per minute in each of 30 con- 
secutive min. of typing, by skill level (WN =40 per 
level). 


V is not known, the question at issue demands a 
measure of relative, not absolute, variability. Use 
of the standard deviation would necessarily have 
resulted in larger standard deviations for the faster 
typists, thereby failing to furnish information on 
the question of variability in relation to output. 


RESULTS AND DISCUSSION 


Results are presented, in turn, for (a) fa- 
tigue effects, (0) consistency of performance 
as a function of skill level, and (c) speed and 
error reliabilities for cumulative portions of 
the 30-min. work period. 


Fatigue Effects 


Table 1 contains descriptive data on the 
performance of all 234 persons, by skill level. 
The mean of 3.9 (118.15/30) errors per min- 
ute (epm) is nearly double the 2 epm found 
to be characteristic in surveys of the straight 
copy speeds of students in training (e.g., Rob- 
inson, 1967) for 5-min. test durations, and 
the difference is probably attributable to such 
factors as Ss’ set for the long work period, the 
novelty of the experimental situation for Ss 
(instructions to “throw-throw” after each 
minute), and the knowledge by those Ss who 
were students that their performance would 
have no effect on grades. 

The minute-by-minute gross words per 
minute and errors per minute means for V 
= 160 (40 from each of 4 broader skill 
ranges) are displayed in Figures 1 and 2. 

Figures 1 and 2 show a slight general trend 
toward decrements in speed and increments in 
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errors as the work period proceeds. By no 
means are the changes large in an absolute 
sense, nor is there any abrupt shift toward 
decreased speed or increased errors. Mainly, 
the fluctuations from minute to minute in 
both speed and errors show that any cumula- 
tive fatigue for this typewriting task is, at 
most, faint. Instead, the fluctuations would 
appear to reflect temporary changes in energy 
expenditure as a means of managing and at 
least partly recovering from whatever fatigue 
may have accumulated during portions of the 
work period. 

Analyses of variance for speed and for er- 
rors were carried out for the four broader 
(25-wpm) and nine narrower (10-wpm) skill 
levels across both 30 (1-min.) trials and 6 
(successive blocks of 5-min.) trials, with 
results as shown in Table 2. 

As shown in Table 2, the eight obtained Fs 
for trials were uniformly significant (p< 
.01), showing, for both broad and narrow skill 
ranges for both 1-min. and 5-min. segments 
of the total 30-min. work period, that speed 
and error scores departed significantly from 
an identical level throughout the work period. 
The obtained Fs for the Levels X Trials in- 
teraction (significant for speed but not for 
errors) show that speed fluctuations varied 
from one skill level to another, whereas error 
fluctuations were similar across all levels of 
typing skill. (Findings on relative variability 
on a level-by-level basis are discussed later.) 
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TABLE 2 


ANALYSIS OF VARIANCE FOR SPEED AND ERRORS FOR 
Four COMBINATIONS OF SKILL LEVELS AND TRIALS 









































I. 4 25-wpm levels and 6 5-min. trials 
Source df Speed Errors 
MS F MS F 
Levels 3} 5833646.60 | 712.41*| 201.52] 7.88* 
Error 156 8188.55 25.54 
Trials Sy 1214.00 11.50 14.62 | 12.30% 
bx 15 435.40 4.12* 1.56 1.31 
Error 780 105.49 1.18 
Total 959 
II. 4 25—wpm levels and 30 1-min. trials 
Levels 3 | 29168240.00 712.41* | 1007.63 7.88* 
Error 156 40942.50 127.71 
Trials 29 1844.13 4.72* 14.79 3.55* 
DX D 87 1104.71 2.82* 4.62 Lon 
Error 4524 390.48 4.15 
Total | 4799 
ILI. 9 10-wpm levels and 6 5-min. trials 
Levels 8 2260608.70 | 1332.80* 82.72 3.26* 
Error 162 1696.12 Zona: 
Trials 5 1521.00 15.02* 16.01 | 13.21* 
Et 40 300.25 2.96* 133 1.09 
Error 810 101.20 1,21 
Total | 1025 
IV. 9 10-wpm levels and 30 1-min. trials 
Levels 8 | 11303045.00 | 1332.78* | 413.60 3.26* 
Error 162 8480.74 126.86 
Trials 29 2140.68 5.67* 16.29 3.90% 
Lexy 232 884.31 2.26* 4.02 96 
Error 4698 390.71 4.17 
Total | 5129 


Note.—Ws of 160 for four levels and of 171 for nine levels. 
p< 01. 


Mean performance scores for the analyses 
of Section I of Table 2 are displayed in Tables 
3 and 4. The trend of these scores across 
trials may be taken as representative of those 
underlying the analyses of Sections II, III, 
and IV of Table 2. 


TABLE 3 


Gross Worps PER MINUTE IN SIX SUCCESSIVE 
5-Min. SEGMENTS OF A 30-MiIn. Work 
PERIOD, BY SKILL LEVEL 


Min. 
Skill 
level 


16-20 | 21-25 | 26-30 | 1-30 


5-24 15.2 | 14.1 14.5 15.4 14.4 14.9 14.8 


25-49 36.2 | 37.0 | 35.8 35.0 | 35.2 35.2 | 35.8 
50-74 62.9 | 63.3 | 61.7 61.1 61.0 | 60.8 | 61.8 
75-108 | 87.3 | 87.1 | 85.9 85.4 86.2 87.1 | 86.5 

5-108 | 50.4 | 50.4 | 49.5 49.4 49,2 49.5 | 49.7 


Note.—N = 40 per level. 
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TABLE 4 


Errors PER MINUTE IN SIX SuccessivE 5-MIN. 
SEGMENTS OF A 30-Min. Work PERIOD, 
BY SKILL LEVEL 

















Min. 
Skill 
level 
1-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26-30 | 1-30 
5-24 2 OO on aero 2.86 2.92 3.08 | 2.82 
25-49 4.51 | 4.97 | 4.77 4.96 5.23 §.28 | 4.95 
50-74 3.34 | 3.92 | 4.44 4.63 4.46 4.66 | 4.24 
75-108 | 2.99 | 3.47 | 3.62 3.77 3.90 3.567 | 3.55 
5-108 | 3.35 | 3.77 | 3.89 4.05 4.13 4.14 | 3.89 


Note.—N = 40 per level. 


Despite the significant Fs for trials and 
for interaction (Table 2), the among-trials 
differences in speed shown in Table 3 are 
trivially small, both within and across levels. 
Intertrial differences within levels are as small 
as .0 and .1 wpm; across levels (5-108 wpm) 
no change in speed from one 5-min. segment 
to any other exceeds 1.2 wpm. The statisti- 
cally significant changes in speed among 
trials are felt to be of little practical conse- 
quence. The data of Table 4, on the other 
hand, reveal progressive increments in errors 
as the work period proceeds, both within and 
across skill levels, thus supporting the existing 
generalization about fatigue effects on qual- 
ity of work in light, sedentary tasks. At the 
same time, intertrial differences as small as .01 
and .06 epm were found (within levels), while 
across levels (5-108 wpm) no change in ac- 
curacy from one 5-min. segment to any other 
exceeds .8 epm. In general, the absolute size 
of the speed and error fluctuations among 
trials is judged not to justify the conven- 


TABLE 5 


CORRELATIONS BETWEEN TRIALS AND 
PERFORMANCE MEANS 














Speed Errors 
Skill level 
30 trials | 6 trials | 30 trials | 6 trials 
5-24 033 .033 .648 .976 
25-49 —.595 —.782 603 896 
50-74 — .656 —.909 .760 872 
75-108 —.134 .289 538 .728 
5-108 —.616 — .830 .869 923 





Note.—N = 40 per level. 


tional restriction of practice durations during 
training to just a few minutes—not for rea- 
sons of supposed substantial fatigue. 

Interlevel differences. On the issue of 
whether increased resistance to fatigue ac- 
companies increases in skill, the obtained cor- 
relations between trials (30 1-min. and 6 
5-min. blocks) and mean performance scores 
of the 40 Ss within each of four 25-wpm skill 
ranges are shown in Table 5. 

The magnitudes of the correlations across 
all Ss (5-108 wpm), shown in the bottom 
row of Table 5, make apparent the trend to- 
ward performance decrements as trials ac- 
cumulate—more markedly for errors than for 
speed. On the issue of differential effects for 
speed, the correlations for the least and the 
most skilled typists do not differ significantly 
from zero. More pertinent (for six trials), 
while these two correlations differ signifi- 
cantly from those for the two middle levels of 
skill (p < .001), there is no apparent trend 
in the speed correlations with increase in skill 
level. The absence of progressive differential 
effects on speed of work accompanying in- 
creases in skill is in accord with the existing 
generalization about negligible fatigue effects 
on speed of work in light, sedentary tasks. 

For errors, on the other hand, the hypothe- 
sized progressive decrease in correlations with 
increase in skill has some, if not consistent, 
support in the data (i.e., for 5-min., but not 
for 1-min., trials). Specifically, for six trials, 
interlevel differences in the obtained error 
correlations are statistically significant (p’s 
ranging from < .05 through < .001) between 
all levels except between the two middle and 
between the two highest levels. It seems likely 
that skill levels identified as 5-24, 25-49, and 
50-108 wpm would be found to differ from 
each other with respect to correlations be- 
tween skill level and mean errors per minute 
scores. In general, then, if not in clear step- 
wise fashion for the four skill levels of the 
present investigation, as skill increases there 
does appear to be increasing resistance to 
decrements in quality of work during a long 
work period. 


Consistency of Performance 


For the total sample of 234 Ss, unequally 
distributed among four 25-wpm skill levels 


FATIGUE AND PERFORMANCE VARIABILITY AMONG TYPISTS 85 


(numbered 1—4 from low to high), mean speed 
Vs of 15,20, 9.95, 7.39, and 5.72 were found, 
respectively, and the obtained F in one-way 
analysis of variance for these Vs (df = 3/ 
230) is highly significant (p < .001). Con- 
sistency of performance varies regularly with 
skill level, Although the plots of Figure 1 sug- 
gest greater absolute speed variability among 
more highly skilled typists, the decrease in Vs 
as skill level increases shows that relative con- 
sistency in speed of performance increases 
with skill—a finding that is in accord with 
intuitive expectations. The reverse was found, 
with one important shift in rank order, for 
errors. Although the obtained F was highly 
significant (p < .01 for df = 3/230), showing 
that relative variability in errors does vary 
with skill level, the Vs for the four skill levels 
(in 1-4 order) were 64.8, 47.6, 56.6, and 
60.8. The skill levels are in 1-4-3-2 order 
from least to most consistent in relative ac- 
curacy, The greatest inconsistency among 
least skilled typists might be expected on the 
grounds of the notorious variability in work 
methods (stroking techniques) exhibited by 
novices at perceptual motor skills involving 
fine movements and control over the small 
muscles. The apparent trend thereafter to- 
ward decreasing consistency, increasing vari- 
ability, in relative accuracy with increase in 
skill is puzzling and contrary to intuitive ex- 
pectations. In summary, on the question of 
consistency of performance with increase in 
skill, in accordance with expectations speed 
grows more stable with increase in_ skill, 
whereas no regular trend was found for rela- 
tive accuracy; in fact, for those above novice 
levels of skill, the trend toward decreasing 
consistency in relative accuracy is contrary to 
normal expectations, 


Score Reliability 


Intercorrelations among speed and among 
error scores were computed for cumulative 
portions of the 30-min. work period (1, 2, 3, 
5, 10, 15, 20, and 30 min,), The intercorrela- 
tion matrix for speed (not shown here) re- 
veals no r below .985. For a single measure, it 
is clear that a 1-min. measure furnishes, for 
all practical purposes, as reliable an index of 
speed as is provided by substantially longer 
tests. The reliability of error measures in 


TABLE 6 


[RROR INTERCORRELATIONS AMONG CUMULATIVE 
SecmMents or A 30-Min. Work PrErrop 
AT THE TYPEWRITER 





Cumulative min. 





Cumula- | XM 
tive min, 

3 5 10 15 20 30 

1 858 .808 .753 .695 .669 .652 .652 

2 951 .900 .824 .794 .769 .747 

3 951 .883 .855 .826 .801 

5 954 .926 .903 .877 

10 980 .961 .933 

15 989 .967 

20 985 

Note,--N = 234 


short work samples is decidedly lower, as 
shown in Table 6. 

If reliabilities at least in the .90s should be 
sought provided they are achievable in tests 
of practicable length, it appears from the 
data of Table 6 that a reasonably reliable 
estimate of typewriting errors cannot be 
achieved in less than 5 min. The 1-min. test 
timings characteristic of the early weeks of 
training and the 3-min. test timings character- 
istic of the next few months provide measures 
of errors whose reliability is too low to make 
those measures usable. For any single mea- 
sure, it would appear that a 5-min. duration 
should be a minimum. This is not to say 
that a single measure is adequate; for, as 
mentioned earlier, typewriting errors vary 
widely from one testing occasion to another. 
However, if a single measure is used, 5 min. 
should be a minimum duration. 


Implications for Training and Employment 
Testing 


Actually, an accumulation of evidence sum- 
marized by West (1967) casts doubt on the 
propriety of the heavy focus on ordinary 
copying skills in training and in employment 
testing for typists. At the same time, until 
teachers and employers come to appreciate the 
substantial irrelevance of “straight copy” 
skills to proficiency at realistic typing tasks, 
it would appear desirable to conduct that 
training and testing with maximum efficiency. 
The findings of the present investigation sug- 
gest that there is no reason to confine much 
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of early training to practice durations of 1-3 
min. The stimulus-response conditions that 
lead to maximum positive transfer mandate 
as close as possible a match between the prac- 
tice durations of training and those of em- 
ployment testing. While a 1-min. sample of 
straight copy performances furnishes a suffi- 
ciently reliable measure of pure stroking 
speed, an acceptably reliable index of strok- 
ing errors in a single test requires at least a 
5-min. measure. Accordingly, 5-min. practice 
and test durations seem advisable for straight 
copy typing skills. 
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ORGANIZATIONAL FACTORS AND INDIVIDUAL 
PERFORMANCE: 


A LONGITUDINAL STUDY ? 


GEORGE F. FARRIS 2 
University of Michigan 


Stability of relationships and time lags in measurement were investigated using 
information collected at two points in time about organizational factors and the 
performance of 151 engineers. Four measures of performance were correlated 
with six organizational factors: involvement in work, influence on work goals, 
colleague contact, diversity of work activities, salary, and number of subordi- 
nates. On the basis of low but statistically significant associations, it was found 
that correlations between organizational factors and performance were generally 
stable with a 6-yr. interval between measurements. Surprisingly, relationships 
were consistently stronger when performance was measured before the organi- 
zational factor. It was concluded that changes in organizational factors which 
follow performance should be considered in research design, organizational 
theory, and, especially, in interpretations of “simultaneous” associations be- 
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tween organizational factors and performance. 


The correlational study done in the field 
setting is the basis for much of our knowledge 
about human behavior in organizations. Typi- 
cally, a questionnaire is administered to indi- 
viduals to obtain their perceptions of “or- 
ganizational” factors such as leadership prac- 
tices or communication, and, at the same time, 
measurements are made of “output” such as 
performance or absences. Two assumptions 
are usually made in such research but rarely 
tested: (a) the relationships discovered at 
the particular time of measurement are stable; 
that is, they would occur for these people in 

1This paper is based upon the author’s disserta- 
tion, submitted in partial fulfillment of the require- 
ments for the PhD degree at the University of 
Michigan. The author is grateful for the comments 
and suggestions of the members of his committee: 
Robert L. Kahn, chairman; Frank M. Andrews, 
Basil S. Georgopoulos, Abraham Kaplan, and J. E. 
Keith Smith. Part of the research was supported by 
Grant NSG-489-28-014 from the National Aero- 
nautics and Space Administration. 

2 Requests for reprints should be sent to the au- 
thor, Massachusetts Institute of Technology, Sloan 
School of Management, 52-590, Cambridge, Mas- 
sachusetts 02139. 
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this organization regardless of the time at 
which the study was conducted; and (6) the 
measurements of organizational factors and 
output refer to essentially the same period of 
time; that is, the correlations obtained are 
simultaneous, with no time lag between mea- 
surement of the first and second factors, 

Because these assumptions are largely un- 
tested, they are open to question. Relation- 
ships between organizational factors and out- 
put may vary over time depending upon such 
outside circumstances as technology, mission 
of the organization, external job market, or 
age of the organization. Findings in the typi- 
cal single correlational study may be a func- 
tion of a peculiar combination of such cir- 
cumstances, 

Similarly, the measurements of organiza- 
tional factors and output may not refer to 
the same point in time. In most studies peo- 
ple describe the current situation in their 
organizations (e.g., how satisfied they are 
with their present job), and the output mea- 
sures are taken over a span of time which is 
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apt to include a considerable period prior to 
the organizational description. This is espe- 
cially true when the output is infrequent 
(e.g., patents produced by scientists) or mea- 
sured subjectively (e.g., top management rat- 
ings of divisional performance). In such 
cases, early as well as current output or im- 
pressions are important. 

This study investigates the questions of sta- 
bility and simultaneity using information col- 
lected at two different points in time from 151 
engineers working for three laboratories of a 
large electronics corporation. These engineers 
were among 1,311 scientists and engineers 
who participated in an extensive investigation 
of scientists in organizations conducted by 
Pelz and Andrews (1966). Their investigation, 
like so many others, was based upon a single 
measurement of organizational factors and 
output. No investigation was made of the 
stability of the relationships obtained, and 
the output measurements referred to the 5-yr. 
period preceding the measurement of the or- 
ganizational factors. One would expect their 
findings to be especially vulnerable to prob- 
lems of stability and simultaneity, since the 
organizations involved were a part of the rap- 
idly changing electronics industry, and the 
measurements of output referred to a rela- 
tively long period of time. 


METHOD 


Self-report questionnaires were received from the 
respondents in 1959 and 1965. Included were items 
asking about six organizational factors and output 
of patents and reports. In addition, in both 1959 
and 1965, colleagues familiar with the respondents’ 
work judged its contribution to science and useful- 
ness to the organization over the past 5 yr. Pearson 
product-moment correlation coefficients were com- 
puted between the factors and performance. 


Organizational Factors 


Six organizational factors were selected for study 
because Pelz and Andrews (1966) had found them 
to be consistently associated with performance. Each 
was measured on a Likert-type scale. The factors 
studied were involvement in technical work, influ- 
ence on work goals, extent of contact with col- 
leagues, diversity of work activities, salary, and 
number of subordinates. 


Performance 


Output. Respondents indicated the number of 
“patents or patent applications” and the number of 


“unpublished technical manuscripts, reports, or 
formal talks (either inside or outside this organiza- 
tion)” which they had produced over the last 5 yr. 
This information was obtained in both 1959 and 
1965. In addition, a question was included in 1965 
asking the respondent to report his output for the 
last 24 yr. By subtracting responses to this question 
from those to the previous one, the respondent’s out- 
put for the first 22 yr. of the 5-yr. period was de- 
termined. Thus, measures of output were available 
for the time periods 1954-1959, 1960-1965, 1960— 
1962, and 1963-1965. 

Judgments. Senior people from both the super- 
visory and nonsupervisory levels judged the per- 
formance of all respondents with whose work they 
were directly familiar. They provided rankings of 
these respondents on two separate measures of per- 
formance over the last 5 yr.: contribution to general 
technical or scientific knowledge in the field and 
overall usefulness in helping the organization carry 
out its responsibilities. Because a ratio of one judge 
for every five respondents was maintained, the work 
of the great majority of respondents was judged 
two or more times. Although each judge worked 
individually, there was substantial agreement among 
them. These rankings by individual judges were then 
combined into an overall ranking of all the respond- 
ents within a laboratory, and a percentile rank for 
each respondent on contribution and usefulness was 
determined following the procedures of Pelz and 
Andrews (1966, Appendix A). 

Adjustment of performance scores. Three factors 
extraneous to the areas of primary research interest 
were found when taken together to account for an 
average of 8% of the variance in the performance 
scores. They were (a) highest degree earned, (6) 
time since receiving highest degree, and (c) time 
with laboratory. Following the procedures of the 
larger study (Pelz & Andrews, 1966, Appendix C), 
the performance scores were each adjusted to com- 
pensate for deviations from the grand mean of 
groups at various levels of these three predictor 
factors. 

Predictions. For each relationship between an 
organizational factor and a measure of performance, 
it was predicted that a significant positive correlation 
would occur. These predictions were based on Pelz 
and Andrews’ (1966) earlier findings and supported 
by several other studies of organizational behavior. 
Following their earlier procedure, conclusions were 
drawn according to the pattern of relationships be- 
tween a given organizational factor and performance. 
In this study the convention adopted was to display 
the findings according to the level of statistical sig- 
nificance attained. (Because 60% of the 151 re- 
spondents completed a short-form questionnaire in 
1959 containing only the questions on involvement 
and performance, the minimum sample size for the 
other organizational factors measured in 1959 is 50. 
Missing data reduced the sample size to 125 for 
involvement and performance measured in 1959 and 
all measurements made in 1965.) 
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TABLE 1 
Sizes OF CORRELATIONS Nuecussary Por SIGNIFICANCE 
At Various Lives or CONIDENCE IN THE 
PRESENT STUDY 
Approximate size of + 
Level of confidence 





N = 50 N = 125 
Ol 28 20 
OS 22 Ld 
10 18 ig 
ve 4 10 
NoteN = 50 and N = 125 were the two mout common 
sample alzes of this atudy, 
RESULTS 


Preliminary Analyses 


Before testing the predictions, an examina- 
tion was made of (a) the interrelationships 
among the various organizational factors mea- 
sured at one point in time, (6) the interrela- 
tionships among the various performance 
measures at a single point in time, and (c) 
the test-retest reliability of the organizational 
factors and performance from 1959 to 1965. 
Overall, the organizational factors were not 
very highly correlated with one another, In 
1959 the median correlation was .18, and in 
1965, .09, The measurements of performance 
were also only mildly related with median 
correlations of .30 and .23 in 1959 and 1965, 
respectively, The judgments of performance 
intercorrelated most highly, as one would 
expect, .63 in 1959 and .56 in 1965, and the 
measure of reports correlated the lowest with 
the other measurements, Because of these 
mild relationships, the four measurements of 
performance were used separately. 

Test-retest reliabilities of the measurements 
between 1959 and 1965 are shown in ‘Table 2. 

The median correlations were .32 for the 
organizational factors and .46 for performance. 
The range in reliability of the factors was 
great, from .10 for contact to .71 for salary. 
For performance, the range was small, from 
39 for patents to 49 for reports. Evidence 
from another study indicates that the relative 
instability of these measures reflects changes 
in the scientist's work situation rather than 
unreliability in the measuring instruments. 
Over a 2-mo, interval Pelz and Andrews 


TABLE 2 


‘Trest-Rerest RELIABILITIES BETWEEN M@rasures 
TAKEN IN 1959 anp 1965 





Measure r N 
actors 
Involvement 46 133 
Influence 24 ot 
Contact 10 54 
Diversity 16 53 
Salary a7 54 
No, subordinates 39 56 
Performance 
Contribution AS 134 
Usefulness A7 137 
Patents* 39 130 
Reports? AD 128 





* The 1959 measure of patents correlated .39 with patents for 
the perlod 1960-1962 and .27 with patents for the period 
1963-1965, 

> The 1959 meagure of reports correlated .46 with reports for 
the perlod 1960-1962 aaa 43 with reports for the period 
1963-1965, 


(1966) found a median item test-retest corre- 
lation of .62 (N = 52) for 89 items from a 
questionnaire very similar to the one used 
here, Apparently over the period 1959-1965 
there were significant tendencies for those 
engineers high on performance, salary, involve- 
ment, and number of subordinates to continue 
to be high on these factors. However, previ- 
ous levels of these factors accounted for only 
15-50% of the variance of their levels in 
1965. Changes in contact, diversity, and influ- 
ence were even greater. 


Stability of Relationships 


The 151 engineers who participated in the 
present study are a nonrepresentative sample 
of the 1,311 scientists and engineers of Pelz 
and Andrews’ original investigation. There- 
fore, the relationships for them between or- 
ganizational factors and performance in 1959 
as well as 1965 were examined. It will be 
recalled that Pelz and Andrews (1966) found 
small but consistent positive associations be- 
tween performance and all the factors under 
investigation in this study. Table 3 shows the 
relationship for 151 engineers. It is apparent 
that in both 1959 and 1965 Pelz and Andrews’ 
(1966) general findings again appear when 
the engineers of the present study are consid- 
ered separately. In 1959 significant positive 
associations occurred between at least one 
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TABLE 3 
RELATIONSHIPS BETWEEN FACTORS AND PERFORMANCE FOR MEASURES TAKEN IN 
1959 anp 1965 
1959 1965 
Factor i é 4 pe 
re ces Patents | Reports ree fs ~ | Patents | Reports 
Involvement—self-report | .11* ote one eee elit et .29**** | — 18 
Influence aOR sll 13 ann .09 .08 19*** | — 03 
Contact—number .02 .19** .16* 05 — .04 .07 —.01 —.16 
Diversity 215% —.17 pe aes 04 05 —.00 elite .02 
Salary 2 2RE* BOO amc eo cue AS Eon PAD Rees OL 12 
No. subordinates mii BA ecaesal lire Ween olifiete Rees ROD aeee .16*** | —.00 
*p < 15. 
*ED < 10. 
HE << .05, 
KEK << OL. 


measure of performance and all of the or- 
ganizational factors except contact. (A mea- 
sure of frequency of contact did show a sig- 
nificant association with patents.) In 1965 
significant positive associations occurred be- 
tween at least one measure of performance 
and four of the six organizational factors: in- 
volvement, influence, salary, and number of 
subordinates, In the case of diversity, a rela- 
tionship with patents significant at the .05 
level of confidence in 1959 was reduced to a 
slight tendency (.15 level of confidence) in 
1965. In 1959 there were 11 relationships be- 
tween organizational factors and measures of 
performance significant at the .05 level. In 
1965, there were 9. Thus, although the spe- 
cific relationships are not identical, two con- 
clusions may be drawn from Table 3: (a) in 
general, Pelz and Andrews’ findings hold for 
the sample of this study, and (0) the relation- 
ships are very similar in 1959 and 1965. 


Simultaneity of the Relationships 


Relationships between the organizational 
factors measured in 1959 and performance 
over the succeeding 5 yr. are shown in Table 
4. Four of the organizational factors—in- 
volvement, contact, diversity, and the number 
of subordinates—were related significantly to 
one kind of subsequent performance. How- 
ever, these are the only four correlations be- 
tween the organizational factors and subse- 
quent performance significant at the .05 level 
of confidence. More than twice as many sig- 


nificant relationships occurred in the two situ- 
ations where performance was measured be- 
fore the organizational factor (11 in 1959 and 
9 in 1965). 

A factor-by-factor comparison underscores 
these differences. For involvement, influence, 
salary, and number of subordinates, more sig- 
nificant relationships occurred when perform- 
ance was measured before the factor. For con- 
tact the one significant relationship occurred 
when the factor was measured first, and for 
diversity the timing of the measurements ap- 
parently makes little difference, although per- 
formance measured in 1965 related more 
strongly to the previous level of diversity. 

Table 5 shows relationships between the 
organizational factors and previous and subse- 
quent output for a 24-yr. period. Again four 
factors—involvement, influence, diversity, and 


TABLE. 4 


RELATIONSHIPS BETWEEN SIX ORGANIZATIONAL 
FACTORS AND SUBSEQUENT PERFORMANCE 





Factor ee ae Patents | Reports 
Involvement —.33 .06 .19***) — 10 
Influence ls het || 08} —.00 aS 
Contact 04 .22***| — 05 | —.07 
Diversity Se c= 05 17a .26*** 
Salary .10 Alone O01 sla 
No. subordinates 14* .25***| — 08 0s 

* 

we S10. 
KK D << .05. 
rep Ole 


ORGANIZATIONAL FAcTORS AND INDIVIDUAL PERFORMANCE 91 


TABLE 5 


RELATIONSHIPS BETWEEN ORGANIZATIONAL FACTORS 
AND PREVIOUS AND SUBSEQUENT OUTPUT FOR 


24 Yr. 
Factor Output 
measured measured 
Factor first first 


Patents|/Reports} Patents |Reports 


Involvement Otte 10 2otttt | — 15 
Influence lS 12 17*** | —.04 
Contact 00 — .03 —.07 —.15 
Diversity p2OP EET C21 eee .10* 
Salary BLS .06 20 eee 13** 
No. subordinates | .06 Dee .02 —.01 
mb S10 
*HK D < 05, 
HK << OL, 


salary—show stronger relationships to previ- 
ous performance. Contact and number of 
subordinates do not show significant relation- 
ships. Apparently, then, the timing of the 
measurements does make a difference in rela- 
tionships. When performance is measured 
first, the relationships are stronger than when 
the factor is measured first. 

In order to determine whether this pattern 
of associations held under different conditions 
of measurement, several additional analyses 
were performed. Separate analyses were con- 
ducted for each of the three sites to determine 
whether peculiarities of the organizational 
climates of individual laboratories had led to 
spurious findings. A second approach at- 
tempted to minimize the effects of changes in 
the engineer’s job situation so that a person’s 
reported level on a factor in 1959 would be 
more apt to continue for the entire time span 
over which subsequent performance was mea- 
sured. The analysis was repeated for 43 en- 
gineers who had been working as “bench 
scientists” throughout the period of the study 
(defined as those who had fewer than four 
subordinates reporting to them in both 1959 
and 1965). Third, partial correlations were 
computed to hold constant past levels of the 
more recent factor in the zero-order correla- 
tions. That is, relationships were determined 
between each factor and subsequent perform- 
ance holding constant past performance, and 
between performance and subsequent amounts 


of each factor, holding constant past amounts 
of each factor. Fourth, the analysis was re- 
peated using eta rather than Pearson 7 as the 
measure of association in order to test for 
curvilinearity. Finally, associations between 
organizational factors and unadjusted per- 
formance scores were examined to determine 
whether they differed substantially from the 
associations obtained using the adjusted 
scores. In each of these additional analyses, 
the original pattern of findings was strongly 
supported.* 


DISCUSSION 


On the basis of low but statistically signifi- 
cant associations, it was found that the rela- 
tionships between the six organizational fac- 
tors and performance were stable from 1959 
to 1965 but that the assumption of simul- 
taneity did not hold. Relationships were con- 
sistently stronger when performance was 
measured before the organizational factor. 

The first finding is encouraging to organi- 
zational theorists, since it suggests that rela- 
tionships between organizational factors and 
performance are consistent over time, even in 
so rapidly changing a climate as the electron- 
ics industry in the “‘sputnik era.” 

The second conclusion, however, is dis- 
couraging in two of its implications. It sug- 
gests that different conclusions will be drawn 
in studies relating organizational factors and 
performance depending on the timing of the 
measurements. When performance was mea- 
sured first, stronger relationships occurred 
than when performance was measured during 
the period following the measurement of the 
organizational factor. Most studies of organi- 
zational behavior have assumed simultaneity 
and examined relationships between past lev- 
els of performance and the present level of 
the organizational factor. The assumption of 
simultaneity, at least for the time spans used 
in this study, is a doubtful one. 

The second implication is that conclusions 
about causal relationships from such investi- 
gations based upon the assumption of simul- 
taneity are apt to be inaccurate. It is logically 
impossible for a factor which occurred in the 
past to have been caused by another which 


8 For details, see Farris (1967a). 
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occurs in the present.* Yet organizational 
theorists looking for patterns of relationships 
in correlational studies of organizational be- 
havior have drawn such conclusions. For ex- 
ample, on the basis of the extensive body of 
literature on influence in organizations (e.g., 
Tannenbaum, 1962), Likert (1961, 1967) has 
argued that influence causes performance. To 
the extent that these correlational studies were 
not based on simultaneous measurements, such 
a conclusion is unsupported. The more parsi- 
monious causal interpretation is that per- 
formance causes influence, since performance 
was measured over a period of time preceding 
the measurement of influence. 

The most striking finding in this study was 
that, although four organizational factors 
were found to relate significantly to one kind 
of scientific performance, performance was 
found to relate significantly to at least one 
subsequent measure of each of the organiza- 
tional factors. Apparently performance is fol- 
lowed by measurable changes in the social- 
psychological working environments of peo- 
ple in organizations like those of the present 
study. Such a phenomenon probably has not 

4 Elsewhere the author (Farris, 1967a, 1967b) has 


developed a method for investigating causal relation- 
ships based upon this fact. 


been given sufficient recognition in past the- 
ories because it has not often been examined 
in past research. In the performance-oriented 
organizations of our society, it has been 
treated as a desired end result rather than a 
potential cause. The changes in organiza- 
tional factors which follow performance 
should be considered in organizational theory, 
research design, and interpretations of “si- 
multaneous” associations between organiza- 
tional factors and performance. 
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CONCEPTUAL AND OPERATIONAL PROBLEMS IN THE 
MEASUREMENT OF VARIOUS ASPECTS OF JOB 
SATISFACTION * 


MARTIN G. EVANS 2 


School of Business, University of Toronto 


Some of the various concepts and operations that have been suggested for the 
measurement of job satisfaction are introduced. An effort is made to explore 
the conceptual and operational relationships between overall job satisfaction, 
level of aspiration, level of attainment, and level of importance. In particular, 
note is taken of inappropriate ways in which these three latter concepts have 
been combined. Finally, a conceptual framework is suggested as a guide to the 


most appropriate methods of combination. 


The concept of job satisfaction is a many- 
faceted one. Although some students see it as 
a generalized affective orientation to all as- 
pects of the work situation (Vroom, 1964, p. 
99), it is clear that such a view expresses the 
resultant of a whole host of orientations to 
specific aspects of the job. The respondent, in 
filling out a measure of general attitude (such 
as the Brayfield-Rothe, 1951, scale) or in 
taking an action such as terminating his em- 
ployment with the organization, is balancing 
in some complex way the pros and cons of his 
present job. Students have been quick to 
realize this and have developed measures that 
tap various aspects of the job (work itself, 
supervision, peers, working conditions, and so 
on). In addition, some have gone further to 
try to tap the more basic dimensions of a 
worker’s responses about his level of satisfac- 
tion of various psychological needs (physio- 
logical, safety, social, ego, and self-actualiza- 
tion). 

At the outset, some of the different aspects 
of job satisfaction should be defined in an 
attempt to provide a consistent vocabulary for 
use in this paper. 


1. The measurement of overall satisfaction. 
As suggested above, this represents a gen- 
eralized affective orientation to all aspects of 


1A version of this paper was presented at the 
Conference of the Association of Canadian Schools of 
Business in Calgary, June 10, 1968. The author wishes 
to thank Lee Bolman, Ed Lawler, and Lyman Porter 
for their stimulating comments. 

2 Requests for reprints should be sent to the au- 
thor, School of Business, University of Toronto, 
119 St. George Street, Toronto 5, Canada. 
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the job. The methods most frequently used 
for this purpose involve attitude scales. Such 
scales may be of the overt, explicit type, such 
as the Brayfield-Rothe (1951) scale with 
items like, 
I like my job better than the average 
worker does. 
Strongly agree Strongly disagree 


i Saas viele 


or they may be of the projective type such as 
Kunin’s (1955) “Faces” scale. The term 
“overall job satisfaction” will be used for this 
concept. 

2. The measurement of satisfaction with 
various aspects of the job. Once again, this is 
an attitude measurement. Scales may be of 
the explicit satisfaction type, 

The pay I get for my job. 

Highly satisfied Highly dissatisfied 


or of a descriptive type, 


ee eee 


The XYZ Company pays as well as any 
around here. 
Strongly agree Strongly disagree 


or of the type such as the Cornell Job De- 
scriptive Index (JDI—Hulin, Smith, Kendall, 
& Locke, 1963; Kendall, Smith, Hulin, & 
Locke, 1963; Locke, Smith, Hulin, & Kendall, 
1963; Macaulay, Smith, Locke, Kendall, & 
Hulin, 1963; Smith, 1963; Smith & Kendall, 
1963), 


Work itself 


© 6)\¢' 4-6 


Frustrating Yes ? No 
Hot Yes ? No 
Challenging Yes ? No 


The term “job-facet satisfaction” will be used 
in referring to this concept, 
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3. The measurement of the attainment of 
either needs or goals. Spitzer (1964, pp. 36— 
37) tried to make a distinction between the 
two concepts of need attainment or goal at- 
tainment. He saw the latter (goal attainment) 
as a directly observable behavioral phenome- 
non and suggested that it can be measured by 
such items as, 


The Opportunity to Develop and Try New Ideas— 
this means having the chance to be creative by 
developing new ways to work and supervise. Also to 
be given the chance to try these new things out so 
they get a fair test. How much is there now? 
[Spitzer, 1964, Appendix B]. 


while the former (need attainment) was seen 
as a less observable attitude that can only be 
tapped through the use of a measure that 
demands a higher level of introspection and 
self-awareness by the respondent. Spitzer sug- 
gested that Porter (1962, 1963a, 1963b) was 
attempting to make measurements at this 
level with such items as, 

The feeling of self-esteem a person gets 
from being in my management 
position. 

How much is there now? 

However, Spitzer rightly pointed out that 
many of the scales in Porter’s questionnaire 
are more like goal-attainment items than need- 

attainment items, for example, 

The opportunity for personal growth and 
development in my management position. 
How much is there now? 

It would appear that at the operational level 
the concepts of goal attainment or need at- 
tainment have not been distinguished. Ac- 
cordingly, the term “goal attainment” will be 

used in describing both concepts.® 

4. The measurement of level of aspiration 
for both needs and goals. This concept deals 
with the feeling a person has about how much 
of a particular goal or need he should have. 


3TIt would appear important to make a distinction 
between this concept (goal attainment) and the 
preceding one (job-facet satisfaction). Goal attain- 
ment (in either of its forms) refers to an estimate 
by the individual of how much of a need or goal he 
is getting. Job-facet satisfaction refers to how satis- 
fied he is with that aspect of the job, in other words, 
how satisfied he is with his present level of attain- 
ment. Note that in operational terms it is possible 
to distinguish between job-facet satisfaction and 
goal attainment. 


Again, Spitzer’s argument about the difference 
between needs and goals could be applied, but 
operationally separation of the two might 
prove difficult. The term “goal aspiration” 
will be used for this concept. 

5. The measurement of importance of either 
(a) job facets or (6) needs and goals. These 
concepts refer to the saliency of a particular 
aspect of the job or of a need or goal, that is, 
the strength of the need or goal to the indi- 
vidual. The terms “job-facet importance” and 
“goal importance” will be used to refer to 
these concepts. 


Having developed measures for these dif- 
ferent aspects of satisfaction, the question 
arises as to how they should be combined in 
order to arrive at a job-satisfaction score 
that best represents the individual’s overall 
affective orientation. To this, there have been 
several solutions; it is the purpose of this 
paper to examine them and their implications. 


Methods of Combining Aspects of Satisfaction 


A review of the literature suggests that at 
least five ways of combining these aspects to 
get some measure of overall satisfaction have 
been developed. They vary in elegance and 
complexity. It is the aim here to evaluate 
these combinations. First, however, they must 
be described. 


1. Simple summation of either (@) job- 
facet satisfaction or (b) goal attainment. The 
researcher simply asks his respondents about 
the satisfaction level of each facet or the 
attainment level of each goal. The total score 
is obtained by summing over facets or goals. 

2. Summation of the product of either (a) 
job-facet satisfaction and job-facet impor- 
tance or (6) goal attainment and goal impor- 
tance. Here the researcher determines how 
satisfied his respondents are (or how much 
they have attained) on each dimension and 
how important each dimension is to them; the 
level of satisfaction (or attainment) is 
weighted by the importance, and the product 
is summed over facets or goals to arrive at an 
overall score. 

3. Summation of the difference between the 
level of goal aspiration and the level of goal 
attainment. Here the researcher asks his re- 
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spondents not only how much of the goal he 
is now getting, but also how much he thinks 
that he should be getting; satisfaction is taken 
to be the difference between aspiration level 
and attainment level and is summed over 
goals for the overall score. 

4, Summation of the product of goal im- 
portance and the difference between level of 
aspiration and level of attainment. This is 
similar to Method 3 except that information 
on importance is also gathered, and the dif- 
ference score is weighted by this before the 
final summation takes place. 

5. Summation of the differences between 
goal importance and goal attainment or goal 
aspiration. Here the researcher gathers infor- 
mation on the two dimensions involved and 
subtracts from importance the other (aspira- 
tion or attainment); the overall score is ob- 
tained by summing the differences over goals. 


In the discussions that follow, each of the 
five methods of combination will be discussed 
in conceptual terms; its appearance in the 
literature will be reviewed, with special em- 
phasis upon those studies that report the 
relationship between the combination and a 
measure of overall job satisfaction. Finally, 
any possible comparisons with previous com- 
binations will be made. 

Simple summation. Conceptually, this is the 
most simple of the methods of combination. It 
is, in addition, the easiest of the measures to 
obtain; only one set of questions need be 
asked of the respondent. The validity of such 
a measure depends upon one assumption— 
that each aspect of the respondent’s job-facet 
satisfaction or goal-attainment space is of 
equal importance; that is, each can be as- 
signed an equal weight. 

Ewen (1967) compared the Cornell JDI, a 
measure of job-facet satisfaction which taps 
the facets of work itself, pay, promotion op- 
portunities, supervision, and peers, with the 
Brayfield-Rothe (1951) scale and the Faces 
scale (Kunin, 1955), both measures of overall 
satisfaction. For three samples, the correla- 
tions reported in Table 1 are obtained. With 
both the overall satisfaction scales, the 
summed JDI shows high correlation (see 
Table 1). Schaffer (1953) related the sum 
of need satisfactions (in present terms, a 


TABLE 1 


CORRELATIONS BETWEEN SUMMED JoB DESCRIPTIVE 
INDEX ScoRES (UNWEIGHTED) WITH THE 
BRAYFIELD-ROTHE AND FACES SCALES 











Sample B-R Faces N 
A 13 74 21 
B .50 .70 23 
c .66 55 120 





Note.—From Ewen (1967). 


measure of goal attainment) to a single mea- 
sure of overall job satisfaction. The correla- 
tion is reported in Table 4. The sum of the 
attainment of the 12 needs tapped by Schaffer 
correlated highly with his measure of overall 
satisfaction. Blai’s (1964) study, in which he 
claimed a correlation of .58 between overall 
job satisfaction and the sum of need-satisfac- 
tion items (measures of goal attainment), is 
judged to be uninterpretable in that the latter 
measure showed serious confusion between 
measures of need satisfaction and measures of 
need importance; most of the items dealt with 
importance. As a corollary, Evans (1968) 
investigated the relationship between job- 
facet satisfaction and goal attainment. The 
correlations between common scales are pre- 
sented in Table 5 for two samples. These are 
high, and it has been shown that the scales 
show convergent and discriminant validity 
(Evans, 1969). 

In summary, it would appear that the rela- 
tionships between overall satisfaction and 
both job-facet satisfaction and goal attain- 
ment are highly positive. 

Summation of the product of importance 
and satisfaction or attainment. Conceptually, 
this is a more elegant formulation than the 
first. It takes into account the individual dif- 


TABLE 2 


CORRELATIONS BETWEEN SUMMED JOB DESCRIPTIVE 
INDEX ScorES (WEIGHTED) WITH THE 
BRAYFIELD-ROTHE AND FACES SCALES 








Sample B-R Faces N 
A 75 nd, 21 
B A8 .68 23 
Gc .66 56 120 


Note.—From Ewen (1967). 
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TABLE 3 


CORRELATIONS BETWEEN Most IMPORTANT AND LEAST 
IMPORTANT JoB DESCRIPTIVE INDEX SCALES WITH 
THE BRAYFIELD-ROTHE AND FACES SCALES 


TABLE 5 


CORRELATIONS BETWEEN FACETS OF JOB 
SATISFACTION (JoB DEscRIPTIVE INDEX) 
AND NEED SATISFACTION 








Most important Least important 








61 | .50 | 100\ 834" ts tor 


Note.—From Ewen (1967). 


ferences that may exist in the importance of 
the facets of satisfaction. In other words, it 
helps to account for the complex balance that 
each individual attains in arriving at an over- 
all evaluation of his satisfaction from his reac- 
tions to the specific facets of the job. Again, 
the data are quite simple to gather. 

Ewen (1967) weighted the JDI scores 
(measures of job-facet satisfaction) by the 
measure of the importance of each scale and 
correlated the sum of the weighted scales with 
the Brayfield-Rothe and Faces scales. The re- 
sults are presented in Table 2. In addition, he 
compared the correlation of the satisfaction 
of the most important JDI scale and the 
least important JDI scale with the overall 
measures (Brayfield-Rothe and Faces). The 
results in Table 3 show that the correlations 
are higher for the most important JDI scale. 
Schaffer found a similar result for his goal- 
attainment scales; the results are reported in 
Table 4. Finally, Ewen obtained very high 
correlations (.98, .99, .99 for the three sam- 
ples) between the unweighted total JDI and 
the weighted (with importance) total JDI. 

Comparison among Tables 1, 2, 3, and 4 
enable judgments to be made as to whether 


TABLE 4 


CORRELATION BETWEEN OVERALL Jos SATIs- 
FACTION AND NEED SATISFACTION 


Satisfaction Correlation 
Total p= 44* 
Most important 1 = 04" 
Least important r= 13 





Note.—From Schaffer (1953). N = 72. 
*p <.001. 


Sample and facet r 
Utility 
Pay OOS 
Supervision .60*** 
Fellows (Dene 
Work itself ee 
Hospital® 
Pay Aon 
Supervision coLiae 
Fellows Deis 
Work itself .16* 
Note.—From Evans (1969). 
aN = 311. 
bN = 83. 
* Db <.05. 
ED < .01: 
eK D < .001. 


the weighting procedure enhances the rela- 
tionship between overall satisfaction and the 
sum of the satisfactions with each facet or the 
goal attainments. The evidence is inconclusive. 
Both Ewen (job-facet satisfaction) and Schaf- 
fer (goal attainment) suggested that slight 
increases in correlation coefficients can be 
achieved by weighting the satisfaction com- 
ponents with importance, but such increases 
are not automatic and in no case are they sig- 
nificant statistically. However, when compari- 
sons are made between least important and 
most important facets and their correlations 
with overall satisfaction, the differences are 
sizable. 

Sum of differences in goal-aspiration and 
goal-attainment levels. This method is slightly 
more sophisticated than the first. In the first, 
the researcher is asking the individual how 
satisfied he is. The individual presumably 
makes judgments for himself about his aspira- 
tions and his present level of attainment with 
regard to his goals and takes these into ac- 
count in his answer to the question of how 
satisfied he is with a particular goal. In this 
third method, the process is made explicit. 
The respondent records judgments of his as- 
piration and attainment; satisfaction is taken 
to be the difference between them. Thus, in 
terms of the concepts originally introduced, 
the following relationship is proposed: For a 
given job facet and its corresponding goal 
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areas, job-facet satisfaction is equivalent to 
the difference between goal aspiration and 
goal attainment. 

Such a combination has had considerable 
use in the literature. However, it has been 
used for the operationalization of a number of 
different concepts. Porter (1962, 1963a, 
1963b) referred to it as the “perceived defi- 
ciency in need fulfillment,” while Spitzer 
(1964) used a similar set of operations to 
measure “goal attainment.” Spitzer reported 
the relationship between such a scale (in the 
areas of opportunities to use new ideas, job 
security, pay, approval from peers, superiors, 
and subordinates, control over job, promotion, 
and personal growth opportunities) and the 
Brayfield-Rothe scale. The multiple correla- 
tion coefficient, a statistical device for taking 
into account all the facets, is presented in 
Table 6. The correlation is highly positive. 

There are no data available to show 
whether this method of combination is su- 
perior to the two previous ones. It has been 
suggested above that it is conceptually more 
elegant. With the data collected by Spitzer 
(1964), a reanalysis could explore the differ- 
ences between overall satisfaction versus the 
sum of goal attainments, and overall satisfac- 
tion versus the sum of the differences between 
goal aspiration and goal attainment. A fu- 
ture project is being planned by the author 
with this in mind. 

Sum of the product of importance and the 
differences. This is similar to the third method 
except that the difference score is weighted by 
the importance of the goal before the sum- 
mation is made. This is conceptually elegant. 
It makes explicit the differences in impor- 
tance, in aspiration, and in attainment. 

The only data available are those provided 


TABLE 6 


MULTIPLE CORRELATION COEFFICIENT BETWEEN 
BRAYFIELD-ROTHE AND FACETS OF NEED 
SATISFACTION (WEIGHTED AND 


UNWEIGHTED) 
Facet R 
Unweighted oe 
Weighted 307 


Note.—From Spitzer (1964). 
*p < 01. 


by Spitzer (1964). The highly positive multi- 
ple correlation between the Brayfield-Rothe 
scale and the scale of weighted differences is 
presented in Table 6. 

Comparisons within Table 6 indicate that 
the fourth method gives a higher multiple 
correlation than the third method; how- 
ever, the differences between them are not 
significant. 

Sum of differences between importance and 
attainment or aspiration. In this method, the 
sum (over goals) of the differences between 
responses on importance of a goal and its 
degree of attainment or level of aspiration 
is taken to represent the overall satisfaction 
score. Conceptually this seems meaningless. 
How can such a difference represent a level 
of satisfaction? An example will indicate the 
problems with such a position. Assume three 
people respond within the following levels of 
importance and attainment: 


Importance 7 3 1 
Attainment 7 3 1 


They will all, by this method of combination, 
have equal satisfaction (in this case repre- 
sented by zero satisfaction). Surely a multipli- 
cative model, Importance X Attainment, in 
which satisfaction scores of 49, 9, 1, respec- 
tively, were obtained would be a more 
accurate representation of reality. 

In spite of this, much research has been 
carried out using this method. In 1960, 
Glennon, Owens, Smith, and Albright sug- 
gested that a method similar to this be 
adopted for the measurement of morale in 
order to “permit management to identify 
‘sore spots’ or low satisfaction issues [p. 
107].” For such a purpose, this measure of 
combination may be appropriate. It enables 
management to identify situations in which 
low satisfaction is coupled with high impor- 
tance and the opposite situation in which high 
satisfaction is coupled with low importance. 
Therefore, it does indicate areas of concern. 
However, it is not appropriate as a measure 
of overall job satisfaction. 

Empirically, it is conceivable that impor- 
tance and aspiration level might show a high 
positive correlation in which case the opera- 
tions would be interchangeable so that the 
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TABLE 7 


MEAN DIFFERENCES IN SCORES (STRENGTH OF NEED, 
DEGREE TO WuicH It Is ATTAINED, D1s- 
SATISFACTION) BETWEEN RESIGNED 
WORKERS AND MATCHED 
CONTINUING WORKERS 





Type of need 





Aspect of need* A 
Recog- Achieve- Netonom 
nition ment y 
Strength (A) .08 aS) .09 
Attainment (B)| —.56*** —.08 —.44** 
Dissatisfaction 
(A-B) .64*** Boo BOS mat 





Note.—From Ross and Zander (1957, p. 335). N = 169. 
aIn all aspects, the scores represent differences between 
continuing pad resigned Workers. 
< .0. 


kD < 1025. 
HK << 10025. 


combination, importance minus satisfaction, 
would be a predictor of overall job satisfac- 
tion. However, no data have been published 
that bear on this question. Until such time 
as it is demonstrated that importance and 
aspiration are operationally interchangeable, 
this combination should be avoided as a 
measure of overall job satisfaction. 

Several researchers (Beer, 1966; Kuhlen, 
1963; Pelz & Andrews, 1966; Ross & Zander, 
1957) have used this combination in order 
to obtain a measure of overall job satisfac- 
tion. Ross and Zander (1957) suggested that 
leaving the organization (a behavioral mani- 
festation of job dissatisfaction) was associated 
with dissatisfaction with the fulfillment of 
the following needs: recognition, achievement, 


and autonomy; that is, leavers were more 
dissatisfied than a demographically matched 
group that remained with the company. They 
measured “need strength” (or the importance 
of the need to the individual) and need satis- 
faction (the degree to which the job pro- 
vided fulfillment of the need). The dissatis- 
faction score was obtained by subtracting 
need satisfaction from need importance. The 
results of this study are reported in Table 7. 
There was little difference between leavers 
and continuers in need strength (ie., need 
importance). In other words, for each need, 
both leaving and continuing workers rated 
it equally important. Major differences be- 
tween the groups were found for the needs of 
recognition and autonomy in the degree to 
which the need was satisfied. The resulting 
intergroup differences in the importance- 
attainment difference are the direct result 
(with the possible exception of the difference 
for the achievement need) of differences in 
attainment alone. Kuhlen (1963) suggested 
a strong relationship (for men but not for 
women) between overall job satisfaction and 
the discrepancy between potential for need 
satisfaction in the job and the individual’s 
need strength (measured on the EPPS, Ed- 
wards, 1954). Beer (1966), in order to de- 
termine a score for “Actual Need Satisfac- 
tion,” obtained the difference between: (a) 
the score on “Job Inventory” which is a 
measure of the perceived opportunity to sat- 
isfy a need on the job and is equivalent to 
asking about the extent to which the need 
is presently satisfied, and (0) the score on 
“Preference Inventory” which is a measure 


TABLE 8 


PEARSON PRopUCT-MOMENT CORRELATIONS BETWEEN LEADERSHIP BEHAVIOR AND 
WORKER-PERCEIVED OPPORTUNITY FOR NEED SATISFACTION 








Perceived opportunity for need satisfaction 


Leadership Behavior 


Security Social 
Initiating structure .20* —.14 
Freedom of action —.21* —.16 
Consideration —.22* —.20* 
Production emphasis .09 .10 


Note.—From Beer (1966, Table 12, p. 43). 
¥*p < .05. 


8D < .015 


Esteem Autonomy Self-actualization 
04 —.13 .04 
.05 09** —.02 
a7 04 21 
01 —.15 .06- 
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TABLE 9 


PEARSON PRODUCT-MOMENT CORRELATIONS BETWEEN LEADERSHIP BEHAVIOR AND 
WorKER NEED SATISFACTION 





Leadership Behavior 


Security Social 
Initiating structure .06 —.08 
Freedom of action —.08 —.19* 
Consideration —.07 —.21* 
Production emphasis O01 04 











Actual need satisfaction 





Esteem Autonomy Self-actualization 
.08 —.02 —.05 
.02 Dif —.04 
.09 mle) 12, 
—.05 —.07 —.07 





Note.—From Beer (1966, Table 13, p. 44). 
Dp < .05. 


of need importance. Beer (1966) reported 
the relationships between leadership behavior 
(Initiation of Structure and Consideration) 
and both “Perceived Opportunity for Need 
Satisfaction” (the extent to which the need 
is satisfied at present) and “Actual Need 
Satisfaction” (the hybrid measure under criti- 
cism). These results are presented in Tables 
8 and 9, respectively. For both measures, the 
relationships are not strong; however, they 
do appear to be stronger in the first case 
where relationships with the conceptually pure 
measure are presented. Finally, Pelz and 
Andrews (1966) used as a measure of overall 
satisfaction the difference between (a) desire, 
measured by questions about the importance 
of a particular aspect, and (0) provision, 
measured by questions about the attainment 
of a particular aspect. They explicitly (Pelz 
& Andrews, 1966, pp. 120-121) equated the 
measure of importance with level of aspira- 
tion, a position that, as has been suggested 
above, requires empirical justification. The 
results related job satisfaction (and desire and 
provision) to components of job performance 
for scientists and engineers. Such a relation- 
ship is undoubtedly a complex one, and 
simple correlations are not to be expected 
(see Brayfield & Crockett, 1955; Evans, 
1969; Spitzer, 1964). However, it would 
appear that the measure of total satisfaction 
did not correlate much better with perform- 
ance than did the total provision score. The 
correlation of total desire with performance 
was not strong. It must be pointed out that 
Pelz and Andrews were aware of some of the 
difficulties with this method of combination 


and preferred that the results for desire and 
provision be reported separately. 


Discussion 


In the preceding section, data have been 
presented that indicate the relationships be- 
tween overall job satisfaction and the aspects 
of job-facet satisfaction, goal aspiration, goal 
attainment, and goal importance. From this 
data, it is clear that the following com- 
binations have some merit in that they ail 
show significant correlations with overall 
satisfaction: 

Combination 1. Overall job satisfaction 
(JS) is the sum of job-facet satisfaction 
(JFS) or goal attainment (GAtt). 


Js = facts (TFs) [a] 


jJs= pe (GAtt) [b] 


Combination 2. Overall job satisfaction is 
the sum of either the product of job-facet 
satisfaction and job-facet importance (JFI) 
or the product of goal attainment and goal 
importance (GImp). 


js= ecu (JFS x JFT) [a] 


JS = ee (GAtt x GImp) ___[b] 


Combination 3. Overall job satisfaction is 
the sum of the differences between goal aspi- 
ration (GAsp) and goal attainment. 


Js= fac GAsp — GAtt 


LOO 


Combination 4, Overall job satisfaction is 
the sum of the product of goal importance 
and the difference between goal aspiration 
and goal attainment. 


1S= goals (GImp x (GAsp — GAtt) ) 


Thus the researcher is faced with a plethora 
of methods of combination, all of which pro- 
vide relatively strong correlations with mea- 
sures of overall job satisfaction. While it may 
be that decisions about which method to use 
can be based upon practical expediency (i.e., 
Ss can only be expected to complete a short 
questionnaire), it is desirable, given a value 
system that includes parsimony and elegance 
in research design, that the decision be made 
to use a method of combination that is 
congruent with the researcher’s conceptual 
framework, 

With this in mind, the following framework 
is presented, ‘This, no doubt, has been implicit 
in much of the earlier discussion, but it should 
be made explicit at this point. It is an attempt 
to trace through the logical relationships that 
exist between the variables. 


1, Overall job satisfaction (JS) is a func- 
tion of the sum (over facets) of the product 
of job-facet satisfaction (JFS) and job-facet 
importance (JFI). 


js = *acets (TES x JFI) 


2. For each facet and its corresponding 
goals, job-facet satisfaction is a function of 
the difference between goal aspiration (GAsp) 
and goal attainment (GAtt). 


JFS = GAsp — GAtt 


3. Consequently, overall job satisfaction is 
a function of the sum (over goals) of the 
product of goal importance (GImp) and the 
difference between goal aspiration and goal 
attainment. 


Js = goals (GImp * (GAsp — GAtt)) 


If this conceptual framework is an accurate 
one, then the researcher can use it as a guide 
in making his decisions about which method 
of combination to use. Combinations 2a and 
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4 are most congruent with the conceptual 
framework. Combination 4 is probably the 
most elegant, while Combination 2a combines 
elegance with brevity of measuring instrument. 

One problem remains: the elegant combina- 
tions do not appear to be better predictors 
of overall satisfaction than the others. Why 
should this be so? One suggestion that can 
be made here is that the measurement of 
importance may not be well developed. One 
observed tendency is for every respondent to 
report that every goal or facet of the job is 
of equal importance to him (see Ross & 
Zander, 1957), with a consequent restriction 
of the range of variation in the measure. It is 
suggested that new methods of measuring 
importance be established. 

Finally, it is suggested that Combination 5 
(summation of the differences between goal 
importance and goal attainment or aspira- 
tion), which has little conceptual meaning, 
should be avoided in situations where overall 
job-satisfaction scores are being computed 
and where individuals are being compared. 
As was pointed out above, such a method has 
its uses; the measurement of overall satisfac- 
tion does not appear to be one of them. 
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CONVERGENT AND DISCRIMINANT VALIDITIES BETWEEN 
THE CORNELL JOB DESCRIPTIVE INDEX AND 
A MEASURE OF GOAL ATTAINMENT * 


MARTIN G. EVANS 2 


School of Business, University of Toronto 


The convergent and discriminant validities of the Cornell Job Descriptive 
Index and the goal-attainment component of Porter’s need-satisfaction measure 
were evaluated according to the Campbell and Fiske (1959) criteria. The co- 
efficient of concordance (W) was suggested as a Statistical test of the fourth 
criterion. The scales demonstrated convergent and discriminant validity. 


With the plethora of measuring instruments 
that are available for the measurement of 
job satisfaction, it is important that the rela- 
tionships between such measures be reported 
so that the investigator, wishing to use one 
of these measures, can be aware of its current 
status. This article presents the relationship 
between a well-documented measure, the 
Cornell Job Descriptive Index (JDI), devised 
by Smith and her associates (Hulin, Smith, 
Kendall, & Locke, 1963; Kendall, Smith, 
Hulin, & Locke, 1963; Locke, Smith, Hulin, 
& Kendall, 1963; Macaulay, Smith, Locke, 
Kendall, & Hulin, 1963; Smith, 1963; Smith & 
Kendall, 1963), and a modification of Porter’s 
(1961, 1962, 1963a, 1963b, 1963c) job-satis- 
faction measure. The criteria developed by 
Campbell and Fiske (1959) for convergent 
and discriminant validity will be used to 
determine the strength of relationships be- 
tween the measures. 


METHOD 
Instruments * 


The Job Descriptive Index. This measure taps five 
areas of job satisfaction, which are presented in 


1 The data reported here were gathered while the 
author was a graduate student at Yale University. 
The author wishes to acknowledge the helpful com- 
ments provided by the following colleagues at Yale 
and elsewhere: Clay Alderfer, Chris Argyris, Lee 
Bolman, Vernon Buck, Tim Hall, Ed Lawler, and 
Lyman Porter. In addition, in two organizations, 
managers, supervisors, and workers gave willingly 
of their time: This is greatly appreciated. 

2 Requests for reprints should be sent to the au- 
thor, School of Business, University of Toronto, 119 
St. George Street, Toronto 5, Ontario, Canada. 

3 Copies of both instruments (as used for the 
utility sample) have been deposited with the Amer- 
ican Society for Information Science. Order NAPS 


Table 1. It is a measure about which Vroom (1964) 
has commented: 


[The JDI] is without doubt the most carefully 
constructed measure of job satisfaction in existence 
today. ... The extensive methodological work 
underlying this measure as well as the available 
norms should insure its widespread use in both 
research and practice” [p. 100]. 


The format of this measure is quite simple; for 
each of the facets of job satisfaction, respondents 
are asked whether a series of words or phrases 
describes that particular aspect. 

The goal-attainment measure. The original instru- 
ment (Porter, 1961, 1962, 1963a, 1963b, 1963c), based 
upon Maslow’s (1954) hierarchy of needs, was used 
to tap the need satisfaction and need importance of 
a large sample (1,900) of American managers. It 
has the following form: For a series of need items, 
the respondent was asked to rate (on a 7-point 
scale): (a@) How much is there now? (b) How 
much should there be? (c) How important is it to 
you? Need satisfaction is the difference between 
b and a, that is, a difference between levels of aspira- 
tion and attainment in each need area. Here, two 
modifications were made to the instrument. First, and 
at an operational level, the items were rewritten so 
as to be applicable to a sample of lower-level em- 
ployees in organizations. The second change was of a 
conceptual nature. The question about level of 
aspiration was omitted so the question took on the 
characteristics of a goal-attainment and _ goal- 
importance measure;* that is, the respondent was 
asked: (@) How much is there now? (6) How im- 
portant is it to you? The goals tapped by this mea- 
sure are presented in Table 1. The responses to the 
first question were taken to be measures of goal 


Document 00196 from ASIS National Auxiliary 
Publications. Service, c/o CCM Information Sciences, 
Inc., 22 West 34th Street, New York, New York 
10001; remitting $1.00 for microfiche or $3.00 for 
photocopies. 

4See Evans (1969) for some discussion of the 
conceptual and operational confusion in the measure- 
ment of goal attainment and need satisfaction. 
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CorNELL JDI AND GoAL ATTAINMENT 


TABLE 1 


FACETS OF JoB SATISFACTION AND GOAL Im- 
PORTANCE TAPPED BY THE MEASURES 


Job Descriptive Index Goal attainment 


Satisfaction with Attainment of 


Pay Pay and fringe benefits 

Work itself Doing a good job 

Opportunities for pro- 

motion 

Supervision Respect from  super- 
vision 

Fellow workers Respect from fellow 
workers 

Improving skills and 

abilities 


Job security 
Serving others (hospital 


only) 
Respect from doctors 
(hospital only) 


attainment. Each item was assumed to load onto 
one or more of the goals. A score for the attainment 
of each goal was obtained by summing across the 
appropriate items.> 

The convergent and discriminant validation in- 
volved the JDI scores and the goal-attainment 
scores; in other words comparison was made between 
job satisfaction and goal attainment. These should 


5 The items and their related goals are to be found 
in the NAPS materials. 
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be related, though not as strongly as might two 
different measures of satisfaction (Evans, 1969). 


Samples 


The questionnaires were administered as part of 
a larger study to two groups: 

1. Workers in a public utility (7=311). This 
organization will subsequently be referred to as 
“utility.” 

2. Nurses in a medium-sized general hospital 
(n=88). This organization will subsequently be 
referred to as “hospital.” 


RESULTS 


Convergent and Discriminant Validity 


The two instruments do not have complete 
overlap in the aspects of job satisfaction or 
goal attainment that they purport to measure 
(see Table 1); this complicates the determi- 
nation of validity. Tables 2 (utility) and 3 
(hospital) present the complete intercorrela- 
tional matrices for the instruments. Campbell 
and Fiske (1957) have established four 
criteria for validity. Of the following criteria, 
No. 1 is for convergent validity; Nos. 2-4, 
discriminant validity. 

1. Entries in the validity diagonal (circled, 
Tables 2 and 3) should be high and signifi- 
cantly different from zero. In other words, 


TABLE 2 
UtiLity: CORRELATIONS BETWEEN GOAL ATTAINMENT AND THE JOB DESCRIPTIVE INDEX 





Item 4 5 


1] 2) 3 











ola 


10 11 12 


9 

















13 | 14 M SD 





Goal attainment 
1. Pay® 
. Supervision® 
. Fellow workers* 
- Work itself* 
. Skills & abilities* 
- Security* 
. Promotion 


NAN WN 


JDI 

8. Pay> 

9. Supervision> 

10. Fellow workers> 
11. Work itself 

12. Skills & abilities 
13. Security | 
14. Promotion> 31 26 35 36 51 





4.86 | 1.26 
4.93 | 1.54 
5.06 | 1.23 
4.95] 1.19 
4.45| 1.53 
5.45 | 2.02 
11.63 | 6.48 
37.89 | 12.83 
41.10 | 11.80 
32.68 | 10.60 
9.93 | 7.62 








Note.—Entries in the validity diagonal are circled. Broken line triangles are heterotrait-heteromethod triangles; solid line, 


heterotrait-monomethod triangles. m = 311. 
* Scales developed from the goal-attainment measure. 
> Scales in the JDI. 
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CoRNELL JDI AND GoaL ATTAINMENT 


correlations between the two instruments on 
the same variables should be high. Utility: 
requirement met. Hospital: requirement met, 
though that for the work scale is low (r = .16, 
B= 05): 

2. A validity diagonal value should be 
higher than values lying in its column and 
row in the heterotrait-heteromethod triangles 
(broken line triangles). As an example, the 
correlation between pay and pay measured 
with each instrument should be higher than 
the correlation between pay measured with 
one instrument and any other variable mea- 
sured with the other instrument. A simple 
sign test was used to evaluate differences 
(Siegel, 1956, pp. 68-74). Utility: require- 
ment met for pay (p= .002), supervision 
(p = .002), and fellow workers (p= .02), 
but not for work itself (p = .254). Hos- 
pital: requirement met for pay (p= .001), 
supervision (p= .001), and fellow workers 
(p = .033), but not for work itself (p = .113). 

3. A validity diagonal value should be 
higher than the corresponding values in the 
heterotrait-monomethod triangles (solid line 
triangles). For example, the correlation be- 
tween pay and pay measured with each instru- 
ment should be higher than the correlation 
between pay measured with one instrument 
and any other variable measured with the 
same instrument. Again, a simple sign test 
was used to evaluate differences.® Utility: the 
requirement was met for pay (p = .031), and 
to some extent for supervision (p = .188) 
where the only heterotrait-monomethod cor- 
relation exceeding the validity diagonal value 
is that for supervision-work. Hospital: the 
requirement was met for pay (p = .003), but 
not for the other scales. 

4. There should be shown the same pat- 
tern of trait interrelationships in all the 
heterotrait triangles of both the monomethod 
and heteromethod blocks. This is essentially 
a question of the ordering of the correlation 
coefficients within each block. To test the 
degree of agreement between the orderings 
in each block, the correlation coefficients were 
ranked by size, and the coefficient of concor- 


6The second and third stages of the convergent 
and discriminant validity check were made on the 
complete correlation matrix. 
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dance, W (Siegel, 1956, pp. 229-238), was 
computed.’ Utility: this requirement was met, 
W = .59 (p= .05). Hospital: this require- 
ment was met, W = .57 (p= .05). 


CONCLUSION 


The results are encouraging. The criteria 
proposed by Campbell and Fiske (1959) are 
very rigorous. Few of the studies they re- 
ported met all four of their criteria. For the 
JDI and the goal-attainment measure, all four 
criteria were met in at least one of the 
samples. It is highly likely that they are 
valid. The use of the coefficient of concor- 
dance to measure the fourth criterion is an 
advance on the examination of the matrix by 
eye that was proposed by Campbell and 
Fiske (1959). 


7 The fourth stage of the check was made on the 
reduced matrix for which both instruments tap 
common dimensions, that is, Rows 1-4/Columns 1-4, 
Rows 10-13/Columns 1-4, Rows 10-13/Columns 
10-13, for the hospital; and Rows 1-4/Columns 1-4, 
Rows 8-11/Columns 1-4, Rows 8-11/Columns 8-11, 
for the utility. 
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A STUDY OF SOME EPPS VARIABLES AS FACTORS OF 
ACADEMIC ACHIEVEMENT 


R. P. BHATNAGAR1 
Regional College of Education, Ajmer, India 


The study related six EPPS personality variables to the academic achievement 
of 261 high school students after controlling the effects of socioeconomic status, 
intelligence, school differences, and age differences. It was found that the rela- 
tionship between personality and achievement is tied with age levels, intelli- 
gence, and specificity of academic achievement. The research concluded that 
the EPPS (Hindi) variables do contribute to academic achievement, but 
differentially at different levels of age and intelligence and for different types 
of academic achievement such as arts and science achievement. 


Various aspects of personality such as moti- 
vation, maturation, socialization, emotional 
adjustment and self-fulfillment, self-concept, 
ego-organization, identification processes, etc., 
have been studied as factors of academic 
achievement. A great volume of psychological 
theory has accumulated around personality 
factors of academic achievement. A variety of 
methods has been used in research studies of 
this kind. Mostly, case-study methods and 
group-study methods involving the use of 
psychological tests have been employed. 
Ephron (1953), Mehus (1953), Gann (1945), 
Fernald (1943), Kimball (1953), and Conk- 
lin (1940) have used case-study methods for 
studying factors of academic failure. All of 
them have found that inadequacy of person- 
ality in some form is an essential character- 
istic of the children who are academically un- 
successful. But the results of research studies 
have not been definite. Rather, as was held by 
Gowan (1960, p. 91); “The problem ap- 
pears more complex than it was first indi- 
cated.” This situation necessitates additional 
replications of such studies, especially cross- 
cultural studies. The present study is an at- 
tempt in this direction. 

1 The author is highly indebted to P. E. Vernon of 
the University of London, Institute of Education, 
for the valuable suggestions made on request about 
this piece of research in his letter of May 29, 1963, 
and to Helen M. Walker of Columbia University for 
her valuable guidance regarding statistical analysis 
and treatment of the data provided when she was in 
India as consultant in the National Council of Edu- 
cational Research and Training, Delhi. Requests 
for reprints should be sent to the author, Reader 
in Education, Regional College of Education, Ajmer 
India. 
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The problem in the present study was to 
investigate the relationship that may exist 
between certain personality variables and the 
academic achievement of high school students. 
The personality variables selected for this 
purpose were the personality needs as mea- 
sured by the EPPS developed in Hindi by 
the investigator (Bhatnagar, 1966a). All the 
15 needs have not been used. Only 6 needs 
which were found to differentiate between the 
two groups of underachievers and overachiev- 
ers were considered valid for the purpose 
(Bhatnagar, 1966b). They are need Achieve- 
ment, need Dominance, need Autonomy, need 
Nurturance, need Endurance, and need Ag- 
gression. 


METHOD 
Subjects 


The population in this study has been defined as 
the male students of Class IX of the city of Morada- 
bad, U.P., India. The population was listed in the 
form of intact schools. There were 12 schools in all 
having about 1,000 students in Class IX. 

From the above population, a sample of six schools 
having 612 students was randomly drawn. All the 
612 students were not included in the final analysis. 
Some students had to be excluded as they did not 
take all the tests, and records of age, etc., were 
not available for them. Control of the socio- 
economic status variable through cross-tabulation 
also reduced the size of the sample. Finally, 261 stu- 
dents were available for the purpose of analysis. 


Data 


The data were collected for all the 612 students. 
These included (a) ages, (b) school marks in the 
form of composite T scores over six subjects, (c) 
scores on the Central Institute of Education (1959) 
test of intelligence, (d) scores on six personality 
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needs measured by the EPPS (Hindi), and (e) 
scores on a socioeconomic status questionnaire. 

The scores on these six need scales were correlated 
with achievement scores after eliminating the effects 
of intelligence and school differences. The results 
were analyzed separately for each of the three age 
groups into which the total sample was split. 


Design 


An attempt was made to control five factors: 
socioeconomic status, age, intelligence, school differ- 
ences, and sex. Failure to control these factors 
would have caused a distortion of relationship that 
might exist between personality needs and academic 
achievement. The control of these factors was 
achieved in different ways. The sex bias was elimi- 
nated by confining the experiment to the population 
of boys only. Age effects were minimized by splitting 
the sample covering a total age range of 6 yr. into 
three subgroups, each with a 2-yr. age range, and 
analyzing the results separately for each age group. 
Socioeconomic status was controlled through cross- 
tabulation by eliminating the top and the bottom 
extreme groups from the distribution and using only 
the middle group which may be considered roughly 
homogeneous with respect to this variable. School 
differences were eliminated by computing correla- 
tions between personality variables and academic 
achievement by using the within-school sums of 
squares and cross-products and applying the formula 
given by Garrett (1958). The within-schools corre- 
lations are considered unaffected by differences in 
school means. Intelligence was controlled by com- 
puting partial correlations (intelligence being par- 
tialed out) between personality variables and aca- 
demic achievement. 


RESULTS 


The results are presented in the following 
sections. The first part contains a discussion 
of the relationship between personality and 
academic achievement in general, while the 
second part considers the relationship when 
specific achievements in arts and science are 
treated separately. 


General Achievement 


When age, intelligence, and the specificity 
of achievement are all disregarded (Table 1, 
total correlations), the nurturance and endur- 
ance needs are positively related and domi- 
nance is negatively related to achievement. 
When intelligence is partialed out and other 
factors are allowed to vary (Table 1, partial 
correlations), only nurturance and endurance 
needs remain significantly related. The domi- 
nance need fails to hold a significant relation- 
ship. Table 2 shows correlations between 
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TABLE 1 


CORRELATIONS BETWEEN PERSONALITY VARIABLES 
AND ACHIEVEMENT: AGE AND SPECIFICITY 
oF ACHIEVEMENT DISREGARDED 


Variable Total Partial 
correlations correlations 
n Ach .056 .028 
n Aff — .087 —.017 
n Dom —.128* —.032 
n Nur pili car .149** 
n End O2Es 145* 
n Agg —.027 2025 
*p <.05 
*kD < .01 


EPPS (Hindi) variables and achievement 
(general) when intelligence is not partialed 
out. Table 3 presents partial correlations be- 
tween the same set of variables when intelli- 
gence is partialed out. 

The partialed correlations can be consid- 
ered comparatively least affected by the cor- 
related variables. The nurturance (7 = .332, 
p< 01; r= .421, p< .01) and endurance 
(7 = .235, p< .05; r = .221;' p <0) meeds 
are positively related to general achievement 
in the first (15.5-17.5 yr.) and second (13.5— 
15.5 yr.) age groups when intelligence is 
partialed out (Table 3). But in the third 
group of the youngest students, approximately 
12.5 yr. old, they are not related. The need 
for achievement is positively related to aca- 
demic achievement for Group II when intelli- 


TABLE 2 


CORRELATIONS BETWEEN ACADEMIC ACHIEVEMENT 
(GENERAL) AND PERSONALITY VARIABLES: 
INTELLIGENCE NOT PARTIALED OUT 







Variables : ; 
15.5 yr.)>| 13.5 yr.)¢ 


n Ach .056 
n Aff — .087 
n Dom —.128* 
n Nur A71S* 
n End 202** 
n Agg —.027 
aN = 76. 

bN = 120. 

oN = 65. 

aN = 261. 

*p <.05 NS aren 

**  < .01 (two-tailed). 
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TABLE 3 


PARTIAL CORRELATIONS BETWEEN ACADEMIC 
ACHIEVEMENT (GENERAL) AND PERSON- 
ALITY VARIABLES: INTELLIGENCE 
PARTIALED OUT 





Variable | (15.5- (13.5- (11.5- Total? 
17.5 yr.)® | 15.5 yr) | 13.5 yr.)° 

n Ach 127 .150 — .256* .028 
n Aff —.051 — .063 .255* | —.017 
n Dom 153 .198* — .060 — .032 
n Nur Soom FAs ef5 1 .149* 
N End 235" EoD 095 .145* 
n Agg .060 .035 Dt .025 

aN = 76. 

bN = 120. 

™N =65. 

dN = 261 


#p < .05 (two-tailed). 
**# > < .01 (two-tailed). 


gence is not held constant (Table 2), but 
loses its significance for that group and be- 
comes negatively related for the third group 
when intelligence is partialed out (Table 3). 
Such changes in the position of significant 
needs over the age groups and within the age 
groups when intelligence is partialed out are 
noticed in the cases of need Affiliation and 
need Aggression also (compare Tables 2 and 
3). This could imply that the variability in 
intelligence affects the relationship that might 
exist between personality and achievement. It 
seems that the relationship between person- 
ality factors and academic achievement is 
not of a general nature in the sense that it 
is the same at all levels of ability. This could 
mean that the relationship of personality with 
academic achievement is tied with intelligence. 

It appears from Table 3 that EPPS needs 
contributing to academic success differ from 
one age group to another. In the first age 
group (15.5-17.5 yr.), it is the configuration 
of positively related nurturance (7 = .332, p 
< .01) and endurance (r= .235, p < .05); 
in the second (13.5—15.5 yr.), of dominance 
(r = .198, p < .05), nurturance (7 = .421, p 
< .01), and endurance (r = .221, p< .05); 
and in the third (11.5-13.5 yr.), of affiliation 
(r= .255, p< .05) and negatively related 
need Achievement (r= —.256, p< .05). 
This suggests that the status of EPPS (Hindi) 
needs as predictors of academic achievement 
is determined by the age level also. It appears 


109 


that the relationship of personality with aca- 
demic achievement is, probably, not the same 
at all age levels. 

The overall picture of the three age groups 
suggests that almost all the six EPPS (Hindi) 
needs except achievement in the third group 
are positively correlated with academic 
achievement. It may be concluded that per- 
sonality factors do contribute to academic 
achievement at the high school level. How- 
ever, it is surprising that the achievement 
need is not related to academic achievement 
in the first and the second age groups (range, 
from 13.5 to 17.5 yr.) and emerges with a 
significantly negative correlation in the third 
age group of students, approximately 12.5 yr. 
old. A nonsignificant relationship between 
need Achievement and academic performance 
has been found in a few studies (Lowell, 
1950; McClelland, Atkinson, Clark, & Lowell, 
1953), but a negative relationship appears to 
be unprecedented. 


Specific Achievement 


On certain grounds it was hypothesized 
that the relationship between personality 
factors and academic achievement might not 
be uniform for different types of achievement. 
It was thought that the findings obtained 
with reference to achievement in general as 
discussed above might not hold in a situation 
in which success in arts or science is predicted 
separately. For this reason each age group 
was further broken down into two subgroups, 
the arts group and the science group. For 
each of these groups partial correlations be- 
tween EPPS variables and specific achieve- 
ments in arts and science (intelligence hav- 
ing been partialed out) were computed. These 
correlations are shown in Table 4. 

It is observed that the direction and magni- 
tude of relationship of EPPS needs with spe- 
cific achievement vary from arts to science 
group at each age level. 

In the first group need Endurance emerges 
as a predictor of achievement in arts, while 
need Achievement predicts achievement in sci- 
ence. In the second age group achievement in 
arts is unrelated to any of the six personality 
variables, while achievement in science is re- 
lated to two needs, need Achievement and 








110 R. P. BHATNAGAR 
TABLE 4 
PARTIAL CORRELATIONS BETWEEN PERSONALITY VARIABLES AND ACADEMIC ACHIEVEMENT 
FOR ARTS AND SCIENCE STUDENTS OF DIFFERENT AGE GROUPS 
Group I Group IT Group III 
Variable 
Arts Science | General Arts Science | General Arts Science | General 
n Ach .106 BS On Br 165 — .240 150 — .220 —.221 — .256* 
n Aff — .229 — .259 —.051 —.115 —.571** | —.063 .020 J 255% 
n Dom 213 — .200 153 —.150 —.129 .198* .070 —.180 — .060 
n Nur pons .210 332 — .090 .250* Bae .092 — .331* 151 
n End .106 .210 PROS .090 .166 221* .389* .027 .095 
n Agg — .022 .050 .060 —.285* | —.040 035 — .386* PS 1On 2121 





* p < .05 (two-tailed). 
** b < .01 (two-tailed). 


Affiliation. In the third age group need Aggres- 
sion is negatively related to achievement in 
arts and positively related to achievement in 
science. Additionally, for the same age group 
endurance is positively related to arts achieve- 
ment and unrelated to science achievement. 
Similarly, for the same age group need Nur- 
turance is negatively related to science 
achievement, but unrelated to arts achieve- 
ment. This suggests that, probably, specificity 
of achievement also affects the status of 
EPPS needs as predictors of academic achieve- 
ment. 

Personality needs are the promising pre- 
dictors of academic achievement in arts and 
science courses at different age levels. 

In the case of the 16.5-yr.-olds need Nur- 
turance is significantly correlated with achieve- 
ment in arts, the coefficient of correlation be- 
ing .314. This implies that probably 9% of 
the success variance in arts is dependent on 
this need. Achievement in science for the 
same age group is predictable from need 
Achievement to the extent of 16%, the co- 
efficient of correlation being .375. In the sec- 
ond group of those approximately 14.5 yr. old 
none of the six EPPS variables is a signifi- 
cant predictor of achievement in arts. Achieve- 
ment in science, however, is significantly cor- 
related with the achievement and affiliation 
needs. But, both the correlations are nega- 
tive. In the third age group of those approxi- 
mately 12.5 yr. old the endurance need ap- 
pears to predict arts achievement in the posi- 
tive direction to the extent of 16%, the corre- 


lation being .389. The aggression need does so 
in the negative direction (r = —.386). The 
science achievement in this group is positively 
predictable from need Aggression (r = .378) 
and negatively predictable from need Nur- 
turance (7 =,— 1331). 


DISCUSSION 


The contribution of EPPS variables to aca- 
demic achievement is presented separately. 


Need for Achievement 


The achievement need fails to predict 
achievement in general for the first and the 
second groups covering students between the 
ages of 13.5 and 17.5 yr. In the third group 
of those approximately 12.5 yr. old it is nega- 
tively related to school success. The achieve- 
ment motivation fails to predict success in arts 
at all age levels. It predicts success in science 
for the first group of approximately 16.5-yr.- 
old students. For the other age groups it fails 
as a predictor. 


Need for Affiliation 


The affiliation motive is positively related 
to general achievement in the third age group 
of 12.5-yr.-old students. In the case of spe- 
cific achievement in science it is negatively 
related for the second age group. It does not 
predict arts achievement at any age level. 


Need for Dominance 


The dominance need is related to general 
achievement only for the second age group. 
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Need for Nurturance 


The nurturance need emerges as a positive 
predictor of general achievement for the first 
and the second group. For the third group it 
fails to predict general achievement. It is 
positively related (r = .314, p < .05) to arts 
achievement for the first age group (15.5-17.5 
yr.). At other age levels it is not related to 
achievement in arts. It is negatively related 
(7 = —.331, p < .05) to science achievement 
for the third group and positively related for 
the second group. It does not predict achieve- 
ment in science for the first age group. 


Need for Endurance 


The endurance need is related to general 
achievement in the first (15.5-17.5 yr.) and 
the second (13.5-15.5 yr.) age groups, but 
not to the third age group. It does not pre- 
dict science achievement for any age group. 
In the case of achievement in arts it is posi- 
tively related (r = .389, p < .05) only to the 
third group (11.5-13.5 yr.). In other age 
groups it fails to predict achievement in arts. 


Need for Aggression 


The aggression need is not related to 
achievement in general at any age level. With 
achievement in arts it emerges as negatively 
related (r = —.285, —.386, respectively) to 
the second and third age groups. It does not, 
however, predict arts achievement for the first 
group. With achievement in science in the 
third age group it correlates positively 
(ry = .378, p < .05). In other age groups it 
does not correlate with science achievement. 
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EXPERIENCE AND PRIOR PROBABILITY IN A COMPLEX 
DECISION TASK * 


MICHAEL H. STRUB 2 


Human Performance Center, Ohio State University 


Six experienced and six naive Ss evaluated probabilistic data, determined sources 
of data generation, and predicted subsequent data in a complex decision task. 
Experience and prior probability were combined factorially. Results indicated 
that experienced Ss (a) were less conservative data evaluators, (b) determined 
data sources on the basis of fewer data samples, (c) were more sensitive to 
prior-probability values, and (d) adopted a maximization strategy in predic- 
tion more consistently than did naive Ss. The importance of using trained 
personnel in the evaluation of realistic decision capabilities and the need for 
caution in generalizing from data obtained from naive Ss who serve in most 
laboratory studies of decision making were discussed. 


The process of decision making usually in- 
volves evaluation of data from the environ- 
ment and subsequent selection of action with 
respect to the environment. Evaluation of 
data can be defined as a determination of 
the extent to which a piece or pieces of evi- 
dence (data) favor the truth of one state of 
the environment over others, while action 
selection is a response made after a state of 
the environment has been judged true by the 
decision maker. For example, a falling barom- 
eter and high humidity are data predictive 
of rain. The carrying of an umbrella by the 
decision maker indicates that he has judged 
rain to be the projected state of the environ- 
ment, perhaps, in part, because he has 
observed these data. 

It has often been found in laboratory situ- 
ations that decision makers are conservative 
data evaluators (for a discussion of ‘con- 
servatism” see Peterson & Beach, 1967). In 
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addition, they tend to purchase more data 
than is formally optimal prior to action selec- 
tion (Irwin & Smith, 1957; Lanzetta & 
Kanareff, 1962; Swets & Green, 1961). How- 
ever, nonlaboratory decision makers, such as 
those involved in military command-control 
systems, are considerably more experienced 
than Ss who serve in most laboratory studies 
of decision making. In view of this fact, it 
becomes meaningful to ask whether experi- 
enced decision makers also display conserva- 
tism in evaluating and purchasing data. Ac- 
cording to Edwards (1966), evidence bearing 
upon this issue is sparse. One purpose of the 
present study, therefore, was to investigate 
the importance of experience in decision 
making. 

Another characteristic of real-world deci- 
sion situations is the presence of a data his- 
tory. In many laboratory simulations of 
decision making (e.g., Kaplan & Newman, 
1966), S begins with complete ignorance of 
the environment: He does not regard any state 
as more likely than any other, and he is void 
of any experience with data upon which to 
base a prediction of the true state. In reality, 
people are rarely so poorly equipped when 
they enter the decision situation. Prior knowl- 
edge that there is a predominance of rainy 
over clear weather in a given location, for 
example, may well be combined with the 
barometer and humidity check in deciding 
to carry an umbrella. Assuming that the data 
to be processed do not conflict with prior 
knowledge (e.g., the barometer and humidity 
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data do not favor sunshine), incorporation 
of prior-probability information into the ag- 
gregate prediction should encourage action to 
be taken after collection of fewer diagnostic 
data than would otherwise be the case. Such 
an effect was suggested by the descriptive 
data of Green, Halbert, and Minas (1964), 
but failed to appear in a direct investigation 
of prior probability and data purchase re- 
ported by Messick (1964). One reconciling 
explanation is that Ss had more experience in 
the type of task employed by Green et al. 
Experience may be a moderator variable 
which increases the decision maker’s sensitiv- 
ity to prior probability. Another objective of 
the present experiment, then, was to test this 
hypothesis directly by combining prior prob- 
ability and experience factorially within the 
same design. 

The familiar ball and urn task which is 
commonly used to assess data evaluation by 
human decision makers requires S to decide, 
on the basis of samples of balls drawn, which 
of two urns is being sampled: one containing 
predominantly red balls or one containing pre- 
dominantly black balls. This task becomes a 
probability-learning task if S$ is instructed 
actually to predict the color of the next draw. 
Correct response maximization in probability 
learning is achieved by the exclusive predic- 
tion of the more frequent datum, a strategy 
which Ss rarely adopt except under conditions 
of monetary payoff and extended training 
(Edwards, 1961). If Ss were to make such 
predictions while concurrently evaluating the 
data, they might well be more prone to exploit 
the maximization rule, especially if they were 
experienced in data evaluation. A final objec- 
tive of the present research, then, was to dis- 
cover if prediction behavior is influenced by 
prior and concurrent experience in data 
evaluation. 


METHOD 
Subjects 


Six naive and six experienced male college stu- 
dents served as Ss. The latter had all served pre- 
viously as decision makers in an ongoing simulation 
project devoted to the study of probabilistic infor- 
mation processing in command-control systems (see 
Southard, Schum, & Briggs, 1964). In conjunction 
with this program they received approximately 114 
hr. of lecture sessions, demonstrations, problem- 
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solving sessions, and on-the-job training in an effort 
to maximize their proficiency in dealing with proba- 
bilistic information. In addition, these Ss had par- 
ticipated in a variety of probability-estimation 
experiments over a period of at least 3 mo. prior to 
this study. The naive Ss, selected for the ongoing 
simulation project on the basis of the same criteria 
as the experienced Ss, had received none of this 
formal training and had served in only one previous 
probability-estimation study of less than 1-wk. 
duration. 

The experienced Ss were paid a base rate of $1.50 
to $1.85 an hour according to their seniority in the 
project mentioned above, while all naive Ss received 
a base rate of $1.25 an hour. It should be noted 
that in order to obtain realistically experienced Ss 
it was necessary to accept a degree of experimental 
confounding. Such Ss, by virtue of their experience, 
were (a) employed on a more permanent basis than 
the naive ones and (b) paid at a higher rate than 
could be justified for naive ones. 

In addition to base pay, all Ss were rewarded or 
penalized in accordance with their performance on 
the two primary task requirements: action selection 
and prediction. For action selection, an optional stop- 
ping scheme was used in which a correct choice 
was rewarded by $.10 minus $.001 for each datum 
drawn (red or black marble). An incorrect choice 
was penalized by $.10 plus $.001 for each datum 
drawn. For the prediction task, each correct anticipa- 
tion of the next datum was rewarded by $.001 with 
no penalty for incorrect prediction. 


Procedure 


The following instructions acquainted Ss with the 
task characteristics: 


I have two bowls, each of which contains a total 
of 100 red and black marbles. I am going to select 
a bowl and begin drawing from it, one at a time. 
After each draw, the marble is put back in the 
bowl. Before I begin to draw, you will be told 
the composition of each of the two bowls such as 
60/40, 70/30. The first number of each composition 
is always with reference to the number of red 
marbles, and the second, the number of black. You 
will also be told the prior probability of selecting 
one or the other bowl. 


Sampling was then initiated, and S indicated, after 
each draw (trial), which bowl (proportion) he 
thought was favored by the current accumulation of 
data and how confident he was in his evaluation. 
Both responses were made by marking response 
sheets: 1 and 2 were used to designate bowls, and 
probability values (.50-.99) were used to indicate 
confidence. Following this evaluation, S was required 
to indicate whether the next marble would be red 
(R) or black (B), again by marking his response 
sheet. At any trial in each series of 100 trials, S 
could make his single action decision by selecting 
1 or 2 as the true source of data (bowl) and accept- 
ing the consequences (payoff or loss). The E re- 
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corded the trial number on which S made his action 
selection, but S continued to evaluate and predict 
data through all 100 samples. By the end of sampling 
there was usually little doubt as to the identity 
of the correct state (bowl), although no feedback 
was given directly until after the experiment. 


Apparatus 


The drawing of red and black marbles was simu- 
lated by an IBM 1401 computer programmed to 
generate random sequences according to the specified 
prior probability (50-50 or 90-10) and the bowl 
compositions (marble proportions of 60/40-70/30 or 
80/20-90/10). Data were presented to Ss via a 
closed-circuit TV system. As each sample was pre- 
sented, it registered on one of two adjacent counters 
which kept cumulative totals of each sample (thus 
providing a realistic “history” upon which to base 
decisions). 


Design 


The design consisted of a between-S variable 
(experience) and two within-S variables (prior prob- 
ability and bowl proportion set), each administered 
at two levels. Each prior-probability and proportion- 
set combination occurred in 12 sequences; thus, a 
total of 48 sequences was viewed by each group. 
These were administered in 10 experimental sessions, 
each requiring about 4 hr. Since sequences within 
each combination varied according to a random- 
sampling procedure, any learning effects were con- 
founded with the relative difficulty of the sequence. 
Thus, all trial and trial-interaction effects were col- 
lapsed into a single error estimate in the analyses. 
Dependent variables analyzed included final prob- 
ability estimates (after 100 samples), number of 
samples purchased, and degree of maximization in 
prediction (i.e., the proportion of trials on which 
the more frequent event was predicted). 


RESULTS AND DISCUSSION 
Final Probability Estimates 


Edwards (1966), in a discussion of the 
parametric difficulties involved in analyzing 
subjective probability scores, made a strong 
case for transformation of these scores to 
log-odds or log-likelihood ratios. Accordingly, 
S’s final subjective probability values were 
transformed here to log-odds form, where 
log odds = log S’s p — log (1 — S’s p). Figure 
1 represents the mean log odds for groups 
obtained across prior-probability and propor- 
tion-set conditions. 

An analysis of variance performed on these 
data indicated that the overall final subjective 
probability was significantly higher for the 
experienced than for the naive group, F = 


MicHAEt H. Strus 


8.68, df = 1/10, p< .025, and that Ss ex- 
pressed significantly more certainty under 
90-10 than 50-50 prior probability, F = 10.61, 
df = 1/10, p< .01. Proportion set was also 
significant, F = 56.25, df = 1/10, p< .001, 
suggesting that Ss found the data samples in 
the 80/20-90/10 set to be more diagnostic 
than those in the 60/40-70/30 set. However, 
the main effects of prior probability and 
proportion set must be interpreted in light of 
the Prior Probability < Proportion Set inter- 
action which also achieved significance, F = 
8.25, df = 1/10, p < .025. The interaction is 
apparent in Figure 1: The increase in cer- 
tainty across prior probability is restricted 
to the 60/40-70/30 proportion-set conditions. 

The finding that the experienced Ss’ final 
subjective probabilities were consistently 
higher than the naive Ss’ suggests that the 
experienced Ss were less conservative than the 
naive Ss. However, any comparison of the 
present phenomenon with previous studies of 
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Fic. 1. Final subjective probability values trans- 
formed to log odds for experienced and naive Ss 
across prior probability and proportion set. 
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conservatism must be made cautiously. The 
present study focused on final subjective 
probability estimates, while conservatism is 
more commonly inferred from a comparison 
of the amount of datum-to-datum revision 
produced by Ss with that produced by Bayes’ 
theorem. The only comparative estimates ob- 
tained in the present study were those involv- 
ing the average of final probability estimates 
provided by experienced Ss, by naive Ss, and 
by the Bayes model. The mean of these final 
certainty estimates was .95 for Bayesian re- 
visions, .94 for the experienced group, and .86 
for the naive group. While such a compari- 
son suggests that the experienced group 
was nearly “Bayesian” in its probabilistic 
estimates, it should be recognized that 
conclusions based upon average subjective 
probabilities are tenuous at best (Edwards, 
1966). 

The finding that Ss were more confident 
about the 80/20-90/10 than the 60/40-70/30 
proportion set was accompanied by higher 
final Bayesian revisions within the former set, 
indicating that the randomly generated se- 
quences were more diagnostic in the 80/20- 
90/10 set. The Prior Probability x Propor- 
tion Set interaction suggests that in the more 
diagnostic proportion set the impact of the 
data was sufficient to eliminate the initial 
diagnostic value of the 90-10 prior probabil- 
ity, while in the less diagnostic proportion set 
prior probability still influenced probability 
estimates after 100 samples. 


Number of Samples Purchased 


Figure 2 indicates the mean number of 
samples purchased for each group across prior 
probability and proportion set. The results of 
an analysis of variance of these data yielded 
an experience effect which approached signifi- 
cance, F = 4.32, df= 1/10, p< .07, and a 
significant main effect of prior probability, 
Pea 5.06, df = 1/10, p< 05. An Finax test 
of the total variability within each group re- 
vealed heterogeneity of variance, F = 1.57, 
df = 288, p< .01. It was decided that the 
extensive intergroup variability warranted an 
analysis of the data by group means and by 
separate within-group analyses. Group means 
were compared by an exact randomization 
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Fic. 2. Mean number of samples purchased for 
experienced and naive Ss across prior probability 
and proportion set. 


test (Kempthorne, 1955; McHugh, 1963). 
For each level of prior probability and dis- 
tribution set, the experienced group purchased 
fewer samples than the naive group. This 
outcome arrangement resulted in a signifi- 
cance level of .0625 which, based on the 
probability array of 2* or 16 outcomes, is the 
minimum value obtainable in the test (since 
DV1G =0025.). 

In an analysis of variance performed on the 
number of samples purchased by the naive Ss 
alone, obtained F values for all effects were 
< 1.00. In contrast, a similar analysis applied 
to the experienced-group scores showed the 
effect of prior probability to be highly signifi- 
cant, F= 7:82, df = 1/269, p <_.001. Since 
the Prior Probability < Ss interaction was not 
signincant, f — 1.27.0] — 5/204, 0 > Jo, it 
was pooled with the error variability in order 
to gain df in the denominator of the F test 
for the prior-probability effect (Winer, 1962, 
p. 203). No other main or interaction effect 
was significant for the experienced group. 

The results of the separate group analyses 
of the number of samples purchased are par- 
ticularly interesting in that they provide an 
explanation for the discrepancy between the 


116 


observation of Green et al. (1964) that prior 
probability does influence action selection and 
Messick’s (1964) finding that it does not. The 
essential difference may have been one of 
experience. The present findings, then, sup- 
port the hypothesis that training is a pre- 
requisite for sensitivity to prior probability. 


Predictions of the More Frequent Datum 


In order to assess the tendency to adopt 
the optimal strategy in the prediction of the 
next sample (maximization), a chi-square test 
was performed on the frequency of adoptions 
of the optimal strategy. From a total of 
228 opportunities to adopt the optimal 
strategy within each group (ie., 6 Ss X 48 
sequences), maximization occurred 225 times 
(or 78%) for experienced Ss as compared 
with 102 times (35%) for naive Ss. The dif- 
ference was highly significant, y? = 46.27, 
df =1, p< .001. However, this finding does 
not imply that the naive Ss’ predictions were 
characterized by a probability-matching rule. 
A chi-square test of the naive Ss’ predictions 
of the more likely sample against the expected 
frequencies indicated that even the naive Ss 
predicted the more likely sample more fre- 
quently than it actually occurred, y? = 22.06, 
dj =1, p < 001. 

The overall finding of a tendency to maxi- 
mize is not too surprising in view of the fact 
that payoff was used (see Luce & Suppes, 
1965), and instructions specified that the se- 
quence was randomly generated (McCracken, 
Osterhout, & Voss, 1962; Nies, 1962; Peter- 
son & Ulehla, 1965). However, the fact that 
in the present study experienced Ss (trained 
under realistic circumstances) came much 
closer to a consistent maximization strategy 
than did naive ones suggests very strongly 
that real-life decision makers are less sus- 
ceptible to maladaptive strategies (such as 
probability matching) than might be expected 
from studies conducted using college sopho- 
mores. Edwards (1961) has clearly demon- 
strated the importance of specific experience 
in probability-learning behavior; this study 
broadens the area of relevant experience to 
data evaluation and action selection. 

Again, however, it should be pointed out 
that the present experienced Ss were members 
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of a team in an ongoing research project and, 
as such, may well have brought into the pres- 
ent situation characteristics other than sheer 
decision-making experience to differentiate 
them from naive Ss. Perhaps, for example, 
experienced Ss viewed the experiment as a 
temporary assignment and tended to accept 
the tedious task more willingly than did the 
naive Ss. Or, because of their more permanent 
employment (and higher rate of base pay), 
they may have been more highly motivated. 
In any case, it seems reasonable to argue that 
whatever may be confounded with experience 
in the present study, these same factors are 
likely to characterize real-life decision makers 
and their training. Therefore, perhaps the 
most important conclusion to be drawn from 
this work is not the importance of experience 
upon various facets of decision making, but 
the importance of realistic training in work 
purporting to represent general principles of 
decision behavior. 


CONCLUSION 


The data from the present study indicate 
that experience gained from training received 
under realistic circumstances is an important 
determinant of subsequent laboratory decision 
or choice behavior. First, the final subjective 
probabilities of the experienced group were 
consistently higher than those of the naive 
group for various levels of prior probability 
and proportion set. Second, the experienced 
group purchased fewer samples in each Prior 
Probability X Proportion Set condition than 
did the naive group. Third, only the experi- 
enced group was affected by prior-probability 
values, strongly suggesting that experience 
acts as a moderator variable which increases 
the decision maker’s sensitivity to such diag- 
nostic cues. Finally, the experienced Ss tended 
to maximize (choose the more frequent event 
consistently) in a probability-learning situa- 
tion to a greater extent than did the naive Ss. 

All of these findings point to a general 
conclusion regarding the applicability of lab- 
oratory-based principles of decision making to 
real-life situations: unless training is realistic, 
the behavior of laboratory Ss may consider- 
ably underestimate human capabilities in a 
number of aspects of the decision process. 


EXPERIENCE AND PRIOR PROBABILITY IN DECISIONS 


REFERENCES 


Epwarps, W. D. Introduction to special issue on re- 
vision of opinions by men and man-machine sys- 
tems. JEEE Transactions on Human Factors in 
Electronics, 1966, HFE-7, 1-6. 

Epwarps, W. D. Probability learning in 1000 trials. 
Journal of Experimental Psychology, 1961, 62, 385— 
394. 

Green, P. E., Hatsert, M. H., & Manas, J. S. An 
experiment in information buying. Journal, of 
Advertising Research, 1964, 4, 17-23. 

Irwin, F. W., & SmirH, W. A. S. Value, cost and 
information as determiners of a decision. Journal of 
Experimental Psychology, 1957, 54, 229-231. 

Kaptan, R. J.. & Newman, J. R. Studies in proba- 
bilistic information processing. LEEE Transactions 
on Human Factors in Electronics, 1966, HFE-7, 
49-63. 

KemptHorne, O. The randomization theory of ex- 
perimental inference. Journal of the American 
Statistical Association, 1955, 50, 946-967. 

Lanzetra, J. T., & Kanarerr, Z. T. Information 
cost, amount of payoff and level of aspiration as 
determinants of information seeking in decision 
making. Behavioral Science, 1962, 7, 459-473. 

Luce, R. D., & Supprs, P. Preference, utility, and 
subjective probability. In R. D. Luce, R. R. Bush, 
& E. Galanter (Ed.), Handbook of mathematical 
psychology. Vol. 3. New York: Wiley, 1965. 


117 


McCracken, J., OsterHout, C., & Voss, J. D. Effects 
of instructions in probability learning. Journal of 
Experimental Psychology, 1962, 64, 267-271. 

McHucu, R. B. Comments on “Scales and statistics: 
Parametric and nonparametric.” Psychological Bul- 
letin, 1963, 60, 350-355. 

Messick, D, N. Sequential information seeking: Ef- 
fects of the number of terminal acts in prior infor- 
mation. Electronic Systems Division Technical 
Documentary Report, 1964, No. 64-606. 

Nirs, R. C. Effects of probable outcome information 
on two-choice learning. Journal of Experimental 
Psychology, 1962, 64, 430-433. 

Peterson, C. R., & Braco, L. R. Man as an intuitive 
statistician. Psychological Bulletin, 1967, 68, 29-46. 

Peterson, C. R., & Utenta, L. J. Sequential patterns 
and maximizing. Journal of Experimental Psychol- 
ogy, 1965, 69, 1-4. 

SOUTHARD, J. F., Schum, D. A., & Briccs, G. E. An 
application of Bayes’ theorem as a hypothesis- 
selection aid in a complex information-processing 
system. USAF AMRL Technical Documentary Re- 
port, August 1964, No. 64-51. 

Swets, J. A., & Green, D. M. Sequential observa- 
tions by human observers of signals and noise. In 
C. Cherry (Ed.), Information theory. London: 
Butterworth, 1961. 

Winer, B. J. Statistical principles in experimental 
design. New York: McGraw-Hill, 1962. 


(Received March 13, 1968) 


Journal of Applied nes 


1969, Vol. 53, No. 


118-123 
PUNITIVE SUPERVISION AND PRODUCTIVITY: 
AN EXPERIMENTAL ANALOG? 
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This research concerns one unexplored aspect of the relationship between 
supervision and worker productivity—the manner in which the supervisor’s 
activities are scheduled. A laboratory setting provided an analog to the super- 
visor’s use of one type of consequence, punishment, to maximize the amount 
of time a worker spends in task activity while minimizing various unauthorized 
behaviors. The setting involved two concurrent operants reinforced with money 
where work on the higher paying one was penalized at various intervals. The 
effects of variations in the schedule of these intervals and the size of the 
penalties were explored. The results indicated that penalty magnitude signifi- 
cantly affected the allocation of work time when the penalties occurred at 
unequal intervals but not at equal ones. Under the unequal condition, the 
higher the penalties the less time spent on the punished task and the greater 
the time on the unpunished one. Low and moderate penalties, however, 
produced less work on the unpunished task than would be predicted on the 


basis of the possible losses through penalties. 


As generally understood, supervision in- 
volves various activities which bear directly 
or indirectly on the job performance of the 
supervised individual: job planning, delega- 
tion of duties, communication of orders, and 
enforcement of work rules. The focus of a 
number of studies involving a variety of 
types of work groups has been the effects of 
the presence or absence of such activities 
or their combinations on worker productivity 
(Argyle, Gardner, & Coifi, 1957; Coch & 
French, 1948; Day & Hamblin, 1964; 
Gouldner, 1954; Katz, Maccoby, Gurin, & 
Floor, 1951; Katz, Maccoby, & Morse, 1950; 
Likert, 1961). 

Supervision, however, is characterized by 
more than simply the presence or absence 
of various activities. The supervisor’s choice 
of activities constitutes only one of the di- 
mensions of what may be defined as his style 
of supervision. Of additional importance, al- 
though largely unexplored, may be the manner 


1 This study was supported by the Cooperative 
Research Program of the Office of Education (Project 
No. S-319) and by the Graduate Research Committee 
of the University of Wisconsin. The author wishes 
to thank Lois Loddeke for her assistance in the 
research and L. Keith Miller and Robert Shotola 
for their suggestions and criticisms. 

2 Requests for reprints should be sent to the 
author, Department of Sociology, University of 
Washington, Seattle, Washington 98105. 


in which these activities are scheduled. Two 
characteristics define the schedule of an activ- 
ity—its frequency and its regularity. Thus 
any supervisory activity can occur at various 
frequencies and at intervals which may be 
either regular or irregular. 

The potential effects of schedules would 
appear to be greater for some supervisory 
activities than for others. For activities such 
as job planning which usually occur infre- 
quently and involve little interpersonal con- 
tact, the effects may be slight. However, for 
those which occur often and involve inter- 
action between the supervisor and worker, the 
effects may be substantial. For example, a 
common function of supervision is to control 
the amount of work activity on an assigned 
job. In many settings supervisors “check up” 
on a subordinate to ensure that he is following 
his assignment. The importance of the fre- 
quency of such checkups has been suggested 
in research by Katz and his associates (Katz, 
Maccoby, Gurin, & Floor, 1951; Katz, Mac- 
coby, & Morse, 1950) comparing the effects 
of close and general styles of supervision. 
In these studies supervisors of the less pro- 
ductive workers were found to be more likely 
to use close supervision involving frequent 
checkups and task instructions. In explana- 
tion, Kahn and Katz (1960) suggest that 
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most workers desire maximum autonomy and 
that supervision in a manner that does not 
permit it leads to lower morale and motiva- 
tion. Other research, however, suggests that 
these effects may be limited to certain types 
of settings and production technologies 
(Argyle et al., 1957; Dubin, 1965). 

Although unexplored, the regularity of the 
supervisory activities might have other im- 
portant effects on work patterns. For example, 
in the use of supervisory checkups to ensure 
job performance, it might be predicted that 
regular checks will be less effective than ir- 
regular ones. With regular checkups the 
worker may learn when he needs to be present 
to coincide with the appearance of the super- 
visor, and thus may spend little additional 
time on the job. With irregular checkups, 
however, he may find such anticipation dif- 
ficult or impossible, and thus must remain on 
the job for longer periods. Examples such as 
these suggest the potentially important effects 
of the schedule of an activity in supervisory 
situations and recommend its more systematic 
investigation in evaluating the effectiveness 
of various supervisory practices. 

The general lack of research on schedule 
as an element of supervision style may have 
been dictated in part by the field research 
techniques that have typically been used in 
previous research on supervision. In general, 
field methods do not permit the measure- 
ment and control necessary to determine the 
effects which this aspect of supervision may 
have on productivity even though under some 
conditions it may determine the effectiveness 
of the supervisory activity. 

The effects of the schedules of various 
consequences have been studied, however, in 
the experimental laboratory where sufficient 
measurement and control may be obtained. It 
may prove desirable, then, first to describe the 
effects of this variable experimentally, and 
then to determine the extent to which the 
results may be generalized to nonexperimental 
supervisory situations. 

In the experimental study of task choice, 
a minimal task situation has been developed 
which permits the introduction of several 
conditions which appear to be functionally 
analogous to those in a nonexperimental situ- 
ation involving the supervisor’s use of check- 
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ups and sanctions to maximize the amount 
of time spent in work. The S in the ex- 
perimental setting is confronted by two 
concurrent operants—spacially distinct tasks 
or responses simultaneously available to S 
(Catania, 1966; Ferster & Skinner, 1957). 
As with single operants, these tasks are 
simple, readily repeatable, and easily mea- 
sured, for example, pressing a lever or button, 
pulling a knob. Different schedules of rein- 
forcement or punishment are generally pro- 
grammed for each of the operants. In such 
a multitask situation, various consequences 
may be manipulated to attempt to eliminate 
an individual’s behavior on one of these 
operants while increasing it on a second. Such 
a condition appears to be functionally equiv- 
alent to the supervisor’s use of various means 
to attempt to maximize the amount of time 
a worker spends in task activity while mini- 
mizing various unauthorized behaviors. While 
previous research in experimental psychology 
has explored some of the variables controlling 
concurrent behavior, unfortunately the com- 
binations of conditions which might be 
generalized to a supervisory setting have not 
been studied. 

This study attempts to demonstrate the 
manner in which the effects of one type of 
consequence, punishment, can be explored 
under conditions relevant to the study of the 
effectiveness of supervision. Punishment of 
various magnitudes was administered on two 
basic schedules for behavior on one of the 
two tasks. The study is the first in a series 
of laboratory experiments using variables 
analogous to various supervisory and task 
work conditions. 

In its broadest sense punitive control in- 
cludes a variety of punishing behaviors 
ranging from fines, threats, or physical abuse 
to more subtle acts such as criticism, ridicule, 
slights, snubs, or avoidance, and thus is 
manifest, at least to some degree, in almost 
all supervisory situations. This study focused 
on two variables relevant to the use of punish- 
ment in affecting the choice of activities: the 
magnitude of the punitive consequences and 
the schedule with which they are adminis- 
tered. Two types of schedules, fixed and vari- 
able interval, were explored. Studies of two 
task settings have not investigated the effects 
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of interval punishment on task choice. Rather, 
in previous research involving concurrent 
operants (Reynolds, 1963) or two choice risk- 
taking situations (Kogan and Wallach, 1967), 
punishment of one of the choices either has 
been continuous or has occurred for a par- 
ticular proportion of the task responses. In 
general such studies suggest a tendency 
toward the elimination of the punished be- 
havior as the negative consequences become 
high. The effects of fixed and variable interval 
schedules of punishment have been com- 
pared using a single operant (Azrin, 1956). 
These results indicate that variable interval 
schedules tend to produce more response 
suppression. 


METHOD 
Setting 


The experimental setting in this study involved a 
choice of two activities each of which was reinforced. 
Both activities were button-pressing tasks located 
at opposite ends of a small work room. For each 
task, S was reinforced for pressing a large button 
mounted on an instrument panel. The reinforcer was 
money. A counter mounted on the panel indicated 
how much money S had earned. The tasks differed 
in the amount of money that could be earned on 
them. The number of presses required before a 
reinforcement count was registered was greater for 
one of the tasks. To standardize the rate at which 
different Ss could work on either task, a 3-sec. time- 
out occurred after each response. The number of 
responses for each cent earned on the higher paying 
task (Task B) was half that on the other (Task A). 
With four responses for each cent required on 
Task B, Ss could earn approximately $2.80/hr; with 
eight responses required for each cent on Task A, Ss 
could earn $1.40. Thus, of the two, Task B was the 
more attractive. 

The effectiveness of the punitive consequences in 
changing task behavior was studied under conditions 
in which its interpretation would be relatively un- 
ambiguous, The consequences were evaluated regard- 
ing the degree to which they produced behavior on 
Task A, the less attractive task. Thus work on 
Task B, the more attractive task, was punished. In 
most nonexperimental settings the unauthorized ac- 
tivities which the supervisor punishes are probably 
not consistently more attractive than any other activ- 
ity including the work itself, as in this study. Thus, 
if the consequences are effective in eliminating an 
activity which is considerably more attractive than 
any other situational alternative, they are likely to 
be at least as effective in other situations where the 
alternatives are of more equal attractiveness. 

Work on Task B, the higher paying alternative, 
was periodically penalized by a loss of money. Only 
one of the two tasks was operable at a time. An 
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S-controlled switch on Task A determined which 
task could be used. The time at which work on 
Task B would be penalized was indicated by the 
sounding of a buzzer, regardless of which task S was 
operating. A penalty was administered only if S had 
Task B switched on when the buzzer sounded. A 
penalty count was added on a separate counter in 
the workroom; the amount of the penalty for that 
session was posted next to the counter. No conse- 
quences accompanied the buzzer if Task A was being 
operated. Since the changeover from work on 
Task B to Task A resulted in a several second delay 
while S crossed the room and turned on Task A, 
frequent switching to avoid penalties resulted in 
reduced reinforcement on either task. A clock on the 
wall was visible at all times. All events and mea- 
sures were programmed and recorded by automated 
equipment in an adjacent room. 


Procedure 


The Ss were told only how to operate the tasks 
and that the sound of the buzzer would be followed 
by a loss of money if they were working on Task B. 
The Ss were college students who were told before 
volunteering that they would have an opportunity 
to make money on a laboratory task. 

The effects of penalty magnitudes were explored 
under both fixed interval (FI) and variable interval 
(VI) schedules of supervision. Different Ss were used 
for each of the schedules. Within a schedule, however, 
Ss were exposed to several different penalty magni- 
tudes. Changes in penalty were made only after Ss 
evidenced stability in task work under a given 
condition. Since this investigation focused on the 
extensive study of several Ss in each variation, a 
statistical analysis of the performance was judged 
not to be appropriate. Rather, similar patterns of 
response were sought in response to changes in the 
experimental conditions. The Ss worked in sessions 
of 1-4 hr. in length several times a week. Payment 
was made at the conclusion of the total hours 
of work. 


RESULTS 
Fixed Interval Supervision. Schedules 


Seven Ss worked over periods ranging from 
4 to 14 hr. on several FI schedules in which 
the buzzer sounded after time periods of 
equal length throughout a work session. The 
different schedules included time intervals of 
1, 3, 5, or 10 min. Penalties from $.02 to 
$2.00 were used. The Ss worked at least 1 hr. 
under each of the penalties. 

The results indicated that none of the FI 
punishment schedules. was effective in pro- 
ducing a substantial amount of activity on 
Task A. Figure 1 shows the percentage of 
time spent by five Ss on Task A working on 
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one of the schedules (FI 3 min.) under vari- 
ous penalty conditions. After less than 1 hr. 
of work under any of the schedules and 
penalty magnitudes, none of the Ss spent more 
than 30% of his time on Task A. With ex- 
perience on a schedule, Ss avoided virtually 
all penalties by switching from Task A im- 
mediately after the buzzer and switching back 
again a few seconds before the next buzzer. 


Variable Interval Schedule 


Four Ss worked over periods ranging from 
24 to 37 hr. on VI schedules in which the 
buzzer sounded after time periods of varying 
lengths. One schedule was used with an aver- 
age of 4 min. for each interval. The intervals 
varied between 10 sec. and 8 min. Penalties 
from $.01 to $1.00 were used. 

During the Ss’ first 2 hr. of work on this 
schedule, no penalties were administered al- 
though the buzzer continued to sound at the 
various intervals. In the remaining hours for 
each S, one of two progressions of penalties 
was used. Two Ss were begun on high penalties 
which were progressively decreased when 
intersession stability was achieved. The other 
two Ss were begun on low penalties which 
were progressively increased. Several penalty 
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magnitudes were repeated following interven- 
ing periods of work under other penalties to 
determine the replicability of their effects. The 
Ss worked at least 2 hr. under each penalty 
condition. 

Figure 2 shows the proportion of time Ss 
spent on Task A under the various penalty 
magnitudes. The results indicate that VI 
punishment was effective in producing activity 
on Task A. For all Ss the proportion of time 
spent on Task A increased with increasing 
penalty size. Small penalties of less than $.03 
had a small effect on task behavior while 
moderate penalties from $.05 to $.15 con- 
siderably increased the time spent on Task A. 
High penalties of $.25 or more generally 
resulted in time spent only on Task A after 
several hours of work. No pronounced effects 
appear to be caused by penalty sequence. 
Task performance under the various penalty 
conditions showed considerable stability and 
replicability particularly under the penalty 
extremes. For example, hourly differences in 
proportion of time on Task A under a given 
penalty averaged 9%. 


DISCUSSION 


The data clearly indicate the importance of 
different schedules in determining the effects 
of punishment on task choice. When penalties 
for work on one of the tasks were scheduled 
at equal intervals throughout a work period, 
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Ss learned quickly to avoid them. Thus 
regardless of their magnitude the penalties 
proved relatively ineffective in increasing ac- 
tivity on the unpunished task. In contrast, 
when the penalties were scheduled at unequal 
intervals, no S$ spent a large amount of time 
on the punished task without receiving a 
number of penalties. Under this condition, the 
larger the penalties the greater the time spent 
on the unpunished task. 

Importantly, however, the effectiveness of 
VI penalties was not predictable as a direct 
function of their effect on total earnings. 
Low and moderate penalties produced more 
than the predicted amount of work on the 
unpunished task. The Ss tended to avoid 
losses often to the detriment of their total 
earnings. For example, Ss working on Task A 
earned approximately $1.40/hr., on Task B 
$2.80/hr. Thus with an average of 14-15 
penalties randomly distributed per hour, Ss 
could maximize their earnings by working 
only on Task A with penalties greater than 
$.10 and only on Task B with penalties less 
than $.10. With $.10 penalties, remaining 
on either task would result in approximately 
the same earnings. The results, however, indi- 
cate that with $.03 penalties, only two Ss 
spent no time on Task A during these periods. 
The other Ss spent 20 and 38% of their time, 
respectively, on the lower paying task. With 
$.05 penalties only one S$ spent no time on 
Task A with the other Ss spending 26, 32, 
and 49% of their time, respectively, on that 
task. For each of these penalty magnitudes 
the rank orders of the average amount earned 
by each S and the proportion of time spent 
on the higher paying task correspond exactly. 
With .$10 penalties, the point at which either 
task could be selected with little difference 
in earnings, all Ss spent more than half of 
their time on Task A. 

In conclusion, the inference drawn from 
these findings appears to be an important one 
for an analysis of the effectiveness of super- 
vision. The results strongly recommend the 
consideration of not only the type of super- 
visory activity but also the schedules with 
which it is performed. As the case of punitive 
control illustrates, schedule type in conjunc- 
tion with the magnitude of the punishment 
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may determine in large part the effectiveness 
of that activity. 

A generalization of these results to non- 
experimental settings, however, should take 
note of the various limiting characteristics of 
this research. For example, punishment in the 
experimental setting was impersonal, specific 
to a given activity, and involved loss of money 
as the only aversive consequence. Much 
supervision in nonexperimental settings, how- 
ever, is personal, associated with a number 
of poorly specified activities, and may involve 
a number of different consequences. In the 
experimental situation only two activities 
were available and money was used as the 
reinforcer for both, while in other settings 
workers often have many alternatives avail- 
able which are reinforced in a variety of ways. 
In addition, workers on the job are often par- 
ticipants in formal or informal groups in 
which additional standards, pressures, or sanc- 
tions are imposed. To what extent such condi- 
tions alter the relationships found in this 
research in a “minimal” task situation will 
need to be determined. 
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A quantitative procedure for making staff selections is applied to the problem 
of hiring a research chemist. The procedure incorporates marginal productivity 
and marginal cost principles from economics, a numerical prediction score, and 


Bayesian principles of probability. 


This article presents a procedure for select- 
ing staff members in terms of an idea which 
is both old and new. The idea is an old one 
in that marginal theory in the purchase of 
the factors of production has existed for a 
century and a half, having originated with 
the British classical economists (Marshall, 
1920). The idea appears new in that marginal 
productivity theory has become implicit in the 
staff-hiring decisions made by some personnel 
managers and executives whom the author has 
interrogated within recent years. This growing 
involvement gives the marginal productivity 
concept of hiring a face validity which justi- 
fies the effort to spell out explicitly what is 
assumed. 

Brogden and Taylor (1950) maintain that 
whatever employee characteristics augment or 
decrease productive output can be accounted 
for in terms of dollars. Whoever at the super- 
visory level is best able to evaluate such 
factors in monetary value should do so. Haire 
(1959) points out that much more is involved 
in successful assignment than the dollar value 
of the employee’s output, for example, job 
satisfaction, personality difficulties, grievances, 
job turnover, and the like. Dunnette (1963) 
rejects any single criterion of job success, and 
Wallace (1965) criticizes utility as a criterion. 

That cost measures of the consequences of 
hiring continue to gain in popularity is re- 
flected by studies such as those by Guttman 
and Raju (1965) and by Mahoney and 
England (1965). The present author recog- 
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nizes that productivity means much more than 
physical units of output from the individual 
employee. In this expanded sense, produc- 
tivity is used in developing the selection 
procedure described here. The Bayesian con- 
cept of minimizing the weighted average risk 
(or of maximizing the weighted average of 
probable gains) is used (Birnbaum & Maxwell, 
1960; Cronbach & Gleser, 1965). 

When the wage policy for employee clas- 
sification is given to the personnel manager 
as a job specification, he endeavors to hire 
the best qualified applicants within the wage 
requirement. In a fixed-wage hiring decision, 
the marginal productivity theory of wages 
does not enter the problem of which indi- 
vidual to select. 

The situation is different when the person- 
nel manager is instructed to look for a staff 
member whose salary is open, depending upon 
qualifications, such as a salesman, researcher, 
engineer, manager, or executive. Then prin- 
ciples of marginal productivity guide the 
employment decision. The flexible-wage hiring 
decision is the type of employment decision 
reviewed here. 


MARGINAL PRODUCTIVITY THEORY FOR 
SELECTING FACTORS OF PRODUCTION 


Before proceeding with the mechanics of 
this decision, the principles of marginal pro- 
ductivity which involve staff selection may be 
briefly reviewed (Samuelson, 1953, 1964). A 
short statement of marginal analysis was given 
by Benson (1967). 

Consider the two principal components of 
production: men and machines. If the avail- 
able money for producing goods to meet 
company sales is $1,000,000, this is divided 
between annual payroll for labor and annual 
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outlay for machines in that ratio which 
maximizes production. If there are too few 
men to operate machines, production falls. If 
there are too few machines, production like- 
wise decreases. 

The cost of production problem may be 
more easily visualized if the number of ma- 
chines is considered fixed for the period of 
time under consideration. The question is 
asked: How large a labor force shall be em- 
ployed to man the machines? Manifestly, as 
more men are hired, the output produced 
continues to rise. As men increasingly get in 
each other’s way, output per man is reduced. 
The relationship between number of em- 
ployees and amount produced is not a straight 
line, but rather a curve of diminishing in- 
creases in output as further men are added 
to the labor force already employed. 

This concept of marginal productivity is 
readily shown by a graph of the value of the 
output produced. The curve in Figure 1 shows 
the increasing value of the total goods pro- 
duced as more labor is purchased. As more 
goods are produced, their total value increases 
at a decreasing rate, owing to the increasing 
difficulty of making the goods with a limited 
number of machines in relation to the number 
of men. 

Where the slope of the curve is 45°, the 
increase in the value of the goods is just equal 
to the cost of the last, or marginal, unit of 
labor bought. Below this point, the increment 
in value of goods sold is greater than the 
increment in the cost of labor. This indicates 
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Fic. 1. Relationship between cost of wages 
and value of production. 
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that money is lost by not producing more. 
Above this point, the increment in the cost 
of labor is greater than the increment in 
the value of goods. This indicates that money 
is lost by spending more in producing goods 
than sales bring. The zero point of the graph 
is chosen where the cost of labor line meets 
the vertical axis. 


Tue Decision To App A STAFF MEMBER 


Marginal productivity theory assumes that 
management is rational and knowledgeable. 
The theory sets forth an ideal for management 
to pursue in its quest to maximize financial 
return to those who provide capital for the 
enterprise. The provision of capital is also a 
factor of production controlled by marginal 
productivity theory. 

To discover how the theory works, it is 
worth threading the line of administrative 
authority which eventually results in hiring 
a senior chemical analyst in the research 
laboratory. 

The management of the company looks at 
the laboratory and decides how much research 
is needed for maximum sales at an acceptable 
margin of profit to stockholders. Management 
considers such things as number of new 
products needed, number of patents, number 
of improvements in manufacturing processes, 
and the effects of these things on augmenting 
sales. The head of the laboratory works within 
a budget. He is expected to produce as much 
research as possible within this budget. If he 
can produce research economically, manage- 
ment will conclude it can afford to undertake 
more research. The budget will be increased 
according to marginal productivity theory. If 
research becomes too costly in relation to 
what it brings back in sales, the budget may 
be reduced. 

The research director builds up his staff 
with that mixture of abilities which will maxi- 
mize the output of the laboratory within his 
allotted budget. He increases his staff only 
when he has authority to do so through a 
larger budget. He makes decisions from time 
to time in hiring replacements. These decisions 
are threefold. (@) How much money shall be 
spent to fill a job at the correct level of abil- 
ity? (b) Which applicant best fits the job 
specifications? (c) What is the right salary to 
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pay the applicant in view of his particular 
qualifications? In this complex problem area 
the personnel department provides its research 
services to help the director. 


SALARY LEVEL IN RELATION TO MARGINAL 
PRODUCTIVITY 


Before a job can be filled, job specifications 
are prepared. Besides defining the kind of job 
to be filled, these specifications define a 
certain performance level. This level costs an 
expected annual salary which represents the 
market worth for a man of the required 
capability. The principal measure of the per- 
formance level is the salary cost in the em- 
ployment market for procuring that level of 
performance. 

For what performance level in a senior 
chemical analyst is it desirable to pay? If the 
man is merely to conduct routine analysis, 
it is not profitable to hire a brilliant and 
imaginative chemist at twice the salary 
needed for someone to do dependable analysis 
in the laboratory. It would also be unprofit- 
able to hire an incompetent assistant incapa- 
ble of complicated and accurate analysis. In 
either event, the output of the laboratory 
would suffer. 

Whatever the units of output of the labora- 
tory, the director judges how much it costs in 
different materials and skills for the last or 
marginal unit of output which is added by 
his administrative decisions. If he can hire 
an added man whose increment in output 
costs less than the same increment in output 
costs by using other factors, then he hires that 
man and uses less of other factors of 
production in his laboratory. 

By hiring the optimum number of chemical 
analysts at performance levels which are most 
efficient for laboratory output, he is hiring 
each man at the point of expenditure where 
the man’s contribution to laboratory output 
is just equal in value to the cost of hiring 
him. Here, the value of the increment in out- 
put is established by the prevailing costs of 
other factors of research needed to achieve 
such output. The increment in output is the 
change in output achieved by moving from 
one salary level to another. This must be 
estimated by the hiring officials or learned 
through research. 
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For maximum output within the available 
cost outlay each vacancy should be filled at 
that performance level where the output 
added by the employee compared with what 
is added by one of less capability at less 
salary just equals the wage difference paid the 
employee who fills the vacancy. The vacancy 
should not be filled at a lower level of per- 
formance, for the drop in output is greater 
than the saving in salary. 

Nor should the vacancy be filled at a higher 
level of performance and salary. The gain in 
output is then less than the increment in 
salary cost. If the overhead expense of adding 
the man is simultaneously considered, the key 
to be evaluated is the difference in output 
between hiring Applicant A at one salary level 
plus overhead cost and hiring Applicant B 
at another salary level plus overhead cost. 

It is convenient to graph the problem of 
performance level in terms of a net produc- 
tivity curve. Figure 2 shows the curve of 
the value of the output added by the job of 
senior chemical analyst. The height of this 
curve is a function of increasing levels of 
performance, measured in salary terms. From 
this curve is subtracted the salary cost at 
each performance level. This yields the net 
productivity curve. The highest point of this 
curve is found where the slope of the pro- 
ductivity curve is 45°. The advantage in 
using the net productivity curve is that the 
hiring problem is more readily visualized as 
one of optimization. The highest point is 
achieved during job specification and appli- 
cant selection. 

The ideas of management concerning 
changes in output resulting from different 
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Fic. 2, Relationship between performance level 
and net productivity. 
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salary levels are vague at best. Management 
usually thinks in terms of what it would be 
just willing to pay as a bargain, if it could get 
someone whose performance level is too low 
or too high. The more rigorous approach is 
to think first of output resulting from dif- 
ferent performance levels and then to subtract 
the salary which must be paid from the value 
of the output achieved. If necessary, the vari- 
able overhead costs for supervising a man 
of more or less capability should be included 
in calculating the net productivity curve. 

Ideally, experimentation would be under- 
taken within a company to investigate the 
effect on productivity of assigning employees 
at various salary-performance levels. Short of 
an expensive experimental design, use may 
be made of historical data in which unplanned 
variations in assignment have occurred. 

Usually it is necessary to rely upon esti- 
mates of net productivity made by managers. 
They have an idea of the optimum perform- 
ance level at which to hire. The curve of 
diminishing returns is also required. To define 
this, supervisors should be asked, “If, as a 
bargain, you were able to hire an employee 
at the X-dollar level of performance, how 
much would you be willing to pay him?” 
From the judgments reported, the required 
curve can be drawn. A capable manager can 
answer in terms of the effect of salary level 
upon the output of the section, knowing what 
he should know about the marginal cost of 
securing changes in output through various 
administrative procedures. 


Types or INDIVIDUAL PERSONNEL 
DECISIONS 


Several component decisions may now be 
differentiated. 

1. Job specification. An aim of specification 
is to peg the salary-performance level at the 
point of highest marginal productivity to the 
company, after subtracting the salary paid. 
This is the point of highest profitability to 
the company. If the job is set too low, the 
company loses money through inefficient per- 
formance of the job. If the job is set too 
high, the company is paying for job quality 
it does not require. If the personnel manager 
is furnished the wrong job specification, he 
may find a man who is not good enough to 
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fill the job, or one who is costly because 
he is better quality material than the job 
actually requires. 

2. Individual selection. The aim of the next 
step is to find the individual whose perform- 
ance most closely matches the job description. 
If the job is filled with someone who does 
not meet the specified level properly, the ef- 
fect of the error is the same as if the job were 
incorrectly specified before hiring. The job 
would then be filled with a man who is away 
from the optimum to one side or the other. 

If the marginal productivity curve is rela- 
tively flat, it may make little difference 
whether the job is filled at optimum level. 
An error to either side would be minor in 
its effect. If the productivity curve, after sub- 
tracting the salary line, is quite peaked, a 
small error may have serious consequence. 
These considerations are illustrated by the 
curves in Figure 3. 

3. Paying the right salary. If a manager 
pays too high a salary rate for his staff mem- 
bers, the effect is to lower the net productiv- 
ity curve. The aim in hiring a single indi- 
vidual is to pay no more salary than is 
required to persuade him to accept employ- 
ment. These details are illustrated in Figure 4. 
Figure 5 diagrams nine kinds of personnel 
choices. The three possibilities of paying more 
than, less than, or the right salary are com- 
bined with the three possibilities of hiring 
a man undergrade, overgrade, or at the right 
grade. 

4. Selecting the correct hiring opportunity. 
The usual situation is one of several qualified 
applicants with somewhat different abilities 
and salary requirements. The aim is to select 
the applicant for whom the net productivity 
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Fic. 3. Effects of noncritical and critical performance 
levels upon value of output. 
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Fig, 4, Relationship between asking salary 
and net value of output. 


is highest, A graph of the net productivity 
curve which marks the positions of applicants 
for the job is a useful aid in visualizing the 
hiring options. After the job curve has been 
drawn, the two pieces of information needed 
concerning the applicant are his performance 
level, measured in job-salary worth in the 
marketplace, and his salary requirement. His 
position on the graph relative to other 
applicants is then clear, 

If an arbitrary limit has been imposed by 
management upon salary, then the aim is to 
select the best qualified applicant at the salary 
which can be paid. If an applicant can be 
hired for less than this, the choice is for the 
man whose net productivity is highest, within 
the salary restriction, 


Cost or Jop REPLACEMENT 


The cost of replacing an employee may be 
much greater than the drop-off from not filling 
specifications closely, or from paying him 
somewhat more than he is worth. When a 
man must be replaced, the cost to the com- 
pany is illustrated in Figure 6. When the 
break-point is reached for replacing a man 
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Fre. 5, Locations of net productivities of nine ap- 
plicants with high, medium, and low performances 
and salaries, 
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Fic. 6. Drop in net productivity curve of the job 
with job replacement. 


who is under ability or over ability, the net 
productivity curve drops perpendicularly. 
Past hiring experiences, both successful and 
unsuccessful, afford data for analyzing risks 
and costs in personnel decisions. Whether the 
employee makes good or not, and he usually 
does perform satisfactorily, follow-up research 
should ascertain how correctly the job was 
specified and whether the job specifications 
were properly met. Especially when a man is 
displaced, the personnel department should 
make inquiry to determine whether the job 
was correctly described and filled, and 
whether at the correct level of compensation. 


ROLE OF THE PREDICTION SCORE IN 
MARGINAL PropuctTivitry HtrRInc 


A numerical score for predicting job suc- 
cess combines numerical weights established 
through multiple regression analysis of vari- 
ables associated with job success in past cases. 

Applied to the marginal productivity pro- 
cedure for hiring, the prediction score should 
estimate the market worth of the individual 
in terms of salary for filling the specified job. 
Then the productivity graph, as in Figure 5, 
can be marked on the horizontal axis to show 
how near the optimum the individual falls 
on the net productivity curve. At the same 
time, the numerical score indicates the proper 
salary to be paid to the prospective employee. 
Depending upon whether he accepts a some- 
what higher or lower salary, his net productiv- 
ity is below or above the curve. 

The dependent variable in the multiple- 
regression analysis leading to the prediction 
score is salary paid in relation to perform- 
ance, In a general sense, the salary received by 
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present employees can be used as the depen- 
dent variable. Alternatively, estimates by 
associates of the salary which each individual 
is worth may be used. 

The prediction score contains errors from 
sampling traits of the individual, errors from 
sampling populations of employees, errors in 
ratings, and errors of estimation resulting from 
the incomplete correlation between predictor 
variables and the predicted numerical score. 
The prediction score does not give a knife- 
edge estimate of performance in market salary 
terms, but a statistical distribution of prob- 
able levels of performance. Corresponding to 
the distribution of levels is a distribution of 
expected net productivities from hiring the 
individual. The summation of these separate 
productivities times the chances of their oc- 
currence gives the expected productivity re- 
turn from hiring the individual. This applies 
Bayesian decision theory. 

If the numerical functions involved are 
continuous and adequately described, exact 
integration can be performed. Otherwise, ap- 
proximation methods must suffice. A pro- 
cedure which combines knowledge of the 
standard error of estimate of the numerical 
score and knowledge of the net productivity 
curve is illustrated in Figure 7. The expected 
productivity from hiring the applicant is equal 
to the average of all of the productivities 
within the error distribution. An approximate 
procedure marks off, by vertical lines, the 
normal error distribution into 10 or 20 equal 
areas. The mean of the net productivities for 
the midpoints in each of these intervals 
estimates the expected productivity from 
hiring the applicant. 
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Fic. 7. Calculation of expected net productivity 
when the distribution of errors in estimating per- 
formance is known. 
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From the foregoing, it is apparent that job 
success is not a categorical question, but 
rather one of relative financial return to the 
company. If the individual’s own future can 
be weighed in monetary terms, this also can 
be included in the comparison of alternative 
job possibilities. 

If little is known about the applicant, the 
standard error of his prediction is large. The 
error distribution then encompasses both the 
risks of hiring someone inadequate for the 
job or too costly for the job. The negative 
productivity values from the vertical drop-off 
on either side of the net productivity curve 
then result in much smaller expected value 
to the company from hiring the applicant. 

After the experience has shown how much 
reduction in the standard error of the predic- 
tion score can be gained from research outlay, 
then it can be decided if it is financially 
desirable to strive for greater precision in the 
prediction score. 


SUMMARY OF STEPS IN THE MARGINAL 
PRODUCTIVITY HIRING PROCEDURE 


1. The job vacancy is fully specified re- 
garding both the abilities and qualifications 
of the person required to fill it and the esti- 
mated salary in the marketplace of a person 
capable of filling it at the optimum salary 
level. (If the job vacancy is a position which 
a trainee is expected to fill in the future, the 
analysis is built around the future assignment. 
In this case the procedure is a logical exten- 
sion of that given for filling an immediate 
position.) 

2. A graph is constructed of the net pro- 
ductivity curve. This curve is based upon 
experimental or historical data, if available. 
Otherwise, the curve is located by asking the 
department head and his associates for esti- 
mates of the net worth to the department of 
filling the job vacancy at various salary levels. 

3. The abilities and qualifications of each 
applicant are described in detail from resumes, 
references, transcripts, and interviews. 

4. An individual estimate is made of the 
market value of each applicant in salary 
terms. How much are his abilities worth when 
measured in terms of what the company 
expects to pay? This estimate can be made 
from multiple-regression analysis of salaries 
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and qualifications of present personnel and 
their job specifications, or from intuitive judg- 
ment of the personnel officer, or from both 
of these sources of information. (If the esti- 
mate is recognizably approximate, then the 
distribution of errors of estimate must be 
considered in making an average estimate of 
the possible results.) 

5. The minimum salary which the applicant 
will accept is ascertained. 

6. The individual’s position on the net pro- 
ductivity graph is plotted by locating his 
market salary worth as the horizontal co- 
ordinate. The vertical coordinate is given by 
subtracting the surplus of his salary require- 
ment over his salary worth from the net 
productivity curve. The height of this position 
gives the net productivity of the applicant. 

7. Of the various applicants, that one is 
hired whose coordinate position shows the 
highest net productivity. If a salary limitation 
has been imposed, then that applicant is hired 
who falls within the salary limitation and 
whose net productivity is highest. 

Example. A job vacancy exists for a senior 
chemical analyst. Supervisors report that the 
job would ideally be filled by a $12,500-a-year 
man. If filled by a $10,000-a-year man in- 
stead of a $12,500-a-year man, the depart- 
ment would be willing to pay only $8,000 for 
him in terms of anticipated department output 
priced at prevailing costs for such output. If 
the job is filled by a $15,000-a-year man, the 
department would feel it worth paying only 
$13,500 .for what he could do in the job 
opening. 

These details fix the net productivity curve 
as down $2,000 at the $10,000 horizontal 
coordinate, and down $1,500 at the $15,000 
horizontal coordinate. Taking the origin at the 
peak of the curve, $12,500, the formula fitted 
to these three points is Y = —.28X? + .016X°, 
where Y is the net productivity in thousands, 
and X is the salary worth in thousands. The 
resulting curve is drawn in Figure 8. 

Individual A has qualifications which in the 
opinion of hiring officers are worth $11,000 
in the marketplace for one who can fill the 
job opening. His asking salary is $11,500. 
The difference between these two figures is 
$500. His market worth for qualifications 
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Fic. 8. Calculation of net productivities 
of three applicants. 


useful in the job opening locate him along 
the horizontal at $11,000. His salary require- 
ment is $500 in excess of this, which lo- 
cates him $500 under the net productivity 
curve and defines his net productivity as 
(—$684 — $500) = —$1,184. Similarly, the 
net productivities of Applicants B and C 
are found from their data to be —$572 and 
—$264. Applicant C, having the highest net 
productivity, is hired. 

It will be noted that the net productivities 
are measured in relation to what would be 
the maximum departmental output if the job 
vacancy were exactly filled by one whose 
salary requirement is equal to his salary 
worth. This provides the most convenient 
origin for carrying through the computation. 
The profitability of the department to the 
company is larger than this. If the profitabil- 
ity of the department were not believed to be 
positive, its budget would be altered by man- 
agement. The curves are drawn with reference 
to net value of output from hiring the 
applicant. 
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Some clever high school students are academically successful, but others with 
equally high IQs are not. Thus, it is possible that achievement involves cogni- 
tive variables other than simply level of intellect (IQ). Science achievement 
was shown to be related to several such variables including abstract thinking, 
originality, and category width. These findings were interpreted as supporting 
the importance of “intellectual style” in achievement. Large sex differences were 
also obtained. Better understanding of the role of style in classroom perform- 
ance would be particularly useful in view of the current desire to identify talent 


early and to foster its realization. 


Not all people with high IQs make out- 
standing contributions to their particular areas 
of specialization. Terman’s geniuses, for ex- 
ample, although much more successful as a 
group than the average, were not universally 
characterized by exceptional intellectual con- 
tributions in the various fields they entered 
(Terman & Oden, 1959). Such substantial, 
but far from perfect, relationships between 
level of ability and performance are found 
in science too. Despite increased interest in 
fostering scientific achievement in_ recent 
years, there is still no clear understanding of 
how potentially successful science students 
differ from other able students who either do 
not enter science, or who enter the field but 
do not succeed. It is now apparent that mere 
level of ability (usually expressed in the form 
of an IQ) is not the key factor which dis- 
tinguishes the science specialist from the non- 
scientist, nor the successful science candidate 
from the unsuccessful. Gibson and Light 
(1967), for example, were unable to distin- 
guish among Cambridge University scientists, 
or to distinguish them unequivocally from 
nonscientists on the basis of IQ. Although 
level of ability is clearly relevant, perhaps in 
the form of a “threshold” (McClelland, 1958, 
pp. 12-13) below which effective scientific 
thinking is not possible, not all successful 
scientists have unusually high IQs, nor are all 
would-be scientists with high IQs successful. 

Increasingly nowadays the notion of intel- 
lectual “style” or “bias” (Hudson, 1966), as 

1 Requests for reprints should be sent to A. J. 
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against mere level, is being emphasized as an 
important variable in the study of intellectual 
processes. The concept of style refers to the 
existence of certain stable idiosyncratic dif- 
ferences among people in the way in which 
they go about taking in, processing, and uti- 
lizing information obtained from their envi- 
ronments (Schroder, Driver, & Streufert, 
1967). Two people of equal capacity may 
differ markedly in the characteristic ways in 
which they deploy their intellectual resources 
in coming to grips with information and in 
the kinds of information which they prefer to 
handle. The concept of style is, in fact, now 
well established in the literature of cognitive 
psychology (Ausubel & Ausubel, 1966; Gard- 
ner, Holzman, Klein, Linton, & Spence, 1959; 
Witkin, 1964), but less frequently utilized 
in the applied field. 

Furthermore, there is some empirical evi- 
dence that preference for science as against 
the arts (Hudson, 1963a, 1963b) and out- 
standing achievement in science (Cropley, 
1967b) are related to intellectual style, while 
theory and popular stereotype both suggest 
that there is a distinct cluster of style vari- 
ables characterizing scientists (Barron, 1965, 
pp. 85-86). Consequently, the present study 
was concerned with the question of whether 
high science achievers at senior high school 
level did differ markedly from low achievers 
in terms of style, as against level, of intellect. 
The relationship of science achievement to 
IQ is well known (Cline, Richards, & Need- 
ham, 1963). The present study was aimed 
at answering the question of whether differ- 
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ences of a more qualitative kind exist between 
achievers and nonachievers. 


METHOD 
Subjects 


A battery of tests was administered to all fifth 
and sixth form students in two high schools in large 
country towns in New South Wales, Australia. A 
total of 178 science students ranging in age from 15 
yr. 8 mo. to 18 yr. 7 mo. completed all tests. This 
group included 104 boys with a mean age of 17.09 
yr. (SD =.64 yr.), and 74 girls for whom the mean 
age was 16.89 yr. (SD=.61 yr). 


Tests 


The battery of tests included a standardized test 
of science achievement specifically designed for use 
with Australian students at this level, published by 
the Australian Council for Educational Research 
(ACER). A measure of level of ability (IQ) was 
also obtained, using the AL-AQ test of intelligence. 
This test, again published by ACER, is intended for 
use with students at senior high school level and 
above, and so avoids some of the variance restriction 
inherent in a highly selected sample like the present 
one. Finally, four tests involving what is here called 
intellectual style were administered. These four tests 
included tests of originality (Torrance, 1962), flexi- 
bility (Torrance, 1962), category width (Pettigrew, 
1958), and a test of the abstractness of intellectual 
functioning based on the developmental psychology of 
Piaget (Tisher, 1962). This test involved showing 
the students relatively common situations, for ex- 
ample, two partly filled containers of water linked 
by a tube. The students were asked both to predict 
the results of certain perturbations introduced into 
the situations, for example, raising one container 
above the other, and, given a result, to say what 
perturbation would have been necessary to yield that 
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result. Scoring was based, not on the rightness or 
wrongness of answers, but on the extent to which 
answers were of a concrete or formal kind. Thus, the 
Tisher test is a version of the Piaget interview tech- 
nique (e.g., Inhelder & Piaget, 1958). Tisher (1962) 
reported high agreement between ratings obtained 
actually using the standard interview technique and 
ratings yielded by his group procedures. 


Procedure 


Standard tests were scored according to the pub- 
lished specifications. The originality score was ob- 
tained by scoring the Tin Can Uses test according 
to the differential weighting procedure described by 
Torrance (1962) and using the weights suggested by 
Cropley (1967a, pp. 109-110). The flexibility score 
involved rescoring the same test, this time allotting 
a point for each clear switch of topic in a given 
student’s responses to a particular item (Torrance, 
1962). Finally, the scores for abstractness of thinking 
were obtained from ratings of students as either 
predominantly concrete, early formal, or late formal, 
in their style of responding to the Tisher test. Stu- 
dents were similarly trichotomized on the other three 
style variables too, by dividing them into high, 
middle, and low scorers. As nearly as tied scores 
permitted, each of these groups contained an exact 
third of the total group. Subsequently, the three 
groups were further subdivided according to sex. A 
two-way analysis of covariance was then carried out 
on the science achievement scores of the various style 
groups, with IQ as covariate in each case, following 
the model proposed by Winer (1962, pp. 590-600). 


RESULTS 
Product-moment intercorrelations, means, 
and standard deviations for achievement, IQ, 


and style variables are shown in Table 1. 
Where data are available, reliabilities are 


TABLE 1 


INTERCORRELATIONS, MEANS, STANDARD DEVIATIONS, AND RELIABILITIES 
FOR ALL VARIABLES 





Variable 1 2 3 4 5 6 

1, Abstract thinking — —.08 —.01 .02 Ei 39 
2. Originality a IY -= 74 21 07 —.05 
3. Flexibility 16 61 — SiS 22 01 
4, Category width .03 —.02 .06 — alk —.01 
Beak 49 .09 04 —.03 - .64 
6. Achievement 49 ids 18 —.21 .67 — 
Males 

M 2.29 48.9 16.6 66.3 172 47.5 

SD 0.78 19.5 3.28 14.7 10.3 8.34 
Females 

M est 37.5 14.8 54.3 117.4 44.3 

SD 0.75 16.6 Brow 16.1 10.2 8.35 
Reliability — .64-.71 .60-.62 te 88 — 


Note.—With df = 72, critical value of a correlation coefficient (p < .05) is .225; with df = 102, the value is .191. 
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TABLE 2 
MEANS, STANDARD DEVIATIONS, AND FS FoR ACHIEVEMENT 
High Middle Low Fs? 
Category 
M SD N M SD N M SD N | Main effect | Interaction 
Operations 
Males S2E2e 4 OM aod 45.7 | 8.03 | 61 42.6 | 5.49 | 10 3.90* 0.13 
Females 22m. Oy fa eee AA ORI e/eLOn er 43in Oo: OMe sel ome ee2O 
Originality 
Males 47.8 | 9.63 | 47 | 48.4 | 8.47 | 31 47.8 | 7.83 | 26 1.09 SHOE 
Females 46.6 | 844 | 13 | 44.9] 682 | 28 | 42.7 | 8.53 | 33 
Flexibility 
Males 48.0 | 8.92 | 39 | 46.5 | 7.91 43 | 48.3 | 7.88 | 22 0.66 2.65 
Females 45.3 | 8.23 La PA Te2 Ge (298 | 23 E42 Ses: 09 esi 
Category width 
Males 46.8 | 7.88 | 49 | 48.6 | 9.20 | 31 47.4 | 7.90 | 24 OI ae 1.43 
Females 40.7 | 5.65 11 43.9 | 9.80 | 28 | 46.0 |} 7.43 | 35 





® For all Fs, df = 2/171. 
* Dp < .05. 
*ED < .01- 


also shown. The figures for originality, flexi- 
bility, and category width are test-retest reli- 
abilities, while for IQ they are Spearman- 
Brown split-half coefficients. The two figures 
for originality and flexibility indicate the 
range of reliabilities cited in a number of 
reliability studies reported in the test manual. 
Correlation data above the diagonal in Table 
1 are those for boys, data below the diagonal 
are those for girls. 

Results of the analyses of covariance of sex, 
style, and achievement data are shown in 
Table 2. This table also shows achievement 
means for the various style groups and 
numbers in those groups. Relationships among 
style, achievement, and sex were significant 
for three style variables out of four. 

The particular procedure adopted in form- 
ing the style groups made it possible to test 
the significance of differences between males 
and females in their distribution among the 
high, middle, and low categories for each style 
variable. Relationships were significant in all 
cases, the x? value for operations being 12.7 
(df = 2, p < .01), for originality, 15.8 (df = 
2, P= .OL),, tor, Hexibility,, 1720 (e; — 2, 


p< .01), and for category width, 22.0 
(dj = 2,7 9'<2.01). 
Discussion 


These results indicate that, at least as far 
as the present students were concerned, cogni- 
tive variables which may loosely be charac- 


terized as describing style rather than level 
of intellectual performance accounted for sig- 
nificant portions of the variance of science 
achievement. The data thus support the 
notion that one of the reasons why only some 
able people become successful scientists may 
be because there is a particular kind of 
cognitive organization appropriate to science 
achievement after level (IQ) effects have been 
removed. They also provide some suggestions 
of what such organizations involve. The most 
successful science students in the present 
study were characterized by highly abstract 
and original thinking and by their character- 
istic ways of relating apparently discrepant 
data. These findings are consistent with others 
based on scores of unusually successful under- 
graduate scientists in an Australian university 
(Cropley, 1967b). In a longitudinal study 
covering the 4 yr. required for an under- 
graduate honors course in science, it was 
shown that men graduating with honors came 
almost exclusively from among those who had 
been rated highly divergent in their style of 
thinking on entry to the university 4 yr. 
previously. 

Of course the present data do not show that 
the relationships between achievement and 
what is here called intellectual style are 
peculiar to science achievement. It is possible 
that the same relationship holds for achieve- 
ment in any area. However, Hudson (1963a, 
1963b, 1966) has demonstrated the existence 
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of marked differences between arts and science 
specialists in terms of intellectual bias. His 
findings, which are well established as far as 
English grammar school boys are concerned, 
show that differences exist between science 
achievers and nonscience specialists in areas 
similar to those involved in the present study. 
Thus, although the possibility of a general 
and simple relationship between style of intel- 
lect and achievement cannot be discounted 
on the basis of the data reported here, other 
studies suggest that this may not be the case. 

Sex differences on the cognitive variables 
are interesting in the light of a report of 
Broverman, Klaiber, Koboyashi, and Vogel 
(1968) that differences in cognitive perform- 
ance between males and females may well 
reflect physiological rather than sociological 
differences between the sexes, centering on the 
greater capacity of males for inhibition of 
ongoing overlearned behaviors in favor of 
more complex, original responses. Such differ- 
ences were not revealed by the IQ test, on 
which the boys’ mean of 117.2 (SD = 10.3) 
was almost identical to the girls’ mean IQ of 
117.4 (SD = 10.2). In the main, too, the sex 
differences were in the direction of superior 
achievement by males. 

Since the tests were all administered at the 
same time, it is not clear whether the existence 
of cognitive organizations appropriate to high 
science achievement led certain individuals to 
enter science in the first place, or whether 
effective science training resulted in the in- 
creased scores of high achievers on the style 
variables. Possibly, nonintellective factors, for 
example, temperament, desire for status, 
parental pressure, etc., induce students to 
enter science, success going to those who ac- 
quire most successfully the necessary style of 
functioning as a result of their scientific train- 
ing. The present data do not indicate which 
is the case. Nonetheless, the longitudinal 
study of Australian undergraduates already 
mentioned showed that potential honors gradu- 
ates were already identifiable in terms of in- 
tellectual bias 4 yr. before graduation, at a 
time when neither achievement nor IQ scores 
differed from those of ultimately less success- 
ful students, In any case, the suggestion that 
intellective variables other than IQ are sig- 
nificantly connected with academic success is 
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particularly relevant to current concern with 
early recognition of children with unusual 
talent. 
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PATTERN ANALYSIS OF BIOGRAPHICAL PREDICTORS 
OF SUCCESS AS AN INSURANCE SALESMAN 


ROBERT TANOFSKY, R. RONALD SHEPPS,! ann PAUL J. O'NEILL ? 


Sales Personnel Research, Metropolitan Life Insurance Company, New York City 


The investigators attempted to examine, with the aid of a computer program, 
all possible combinations of biographical items associated with different levels 
of sales production. The Ss were 1,525 life insurance salesmen. A combination 
of high prior income and more than two dependents at the time of application 
was found descriptive of high producers. Low producers were characterized 
by low earnings prior to their appointment. Age, education, marital status, and 
sales experience were of negligible importance. 


The present investigation is part of a long- 
term effort to clarify the relationship between 
biographical characteristics of life insurance 
salesmen and their actual sales performance. 
Studies by Ferguson (1960) and Shepps, 
Tanofsky, and Mead (1967) have shown eco- 
nomic history to be predictive of sales pro- 
duction during the first 1-3 yr. on the job. 
The magnitude of this relationship is sug- 
gested by the correlation of .32 observed in 
the latter study between salary and first 
year sales. 

In the present study it was decided to 
examine jointly salary and five other bio- 
graphical predictors. Education, number of 
dependents, marital status, age, and previous 
sales experience have all been studied indi- 
vidually by the Life Insurance Agency Man- 
agement Association (1954). They all are 
reported to be predictive of success as a life 
insurance agent, although in statistically com- 
plex ways. It was decided to examine these 
predictors by means of pattern analysis, a 
technique for the inspection of the combined 
pattern of scores contained within a set of 
predictors. 

Meehl (1950) gave an early demonstration 
of the manner in which the pattern formed by 
two dichotomized predictors could show very 
high validity despite the zero validity of the 


1 Requests for reprints should be sent to R. R. 
Shepps, Marketing and Field Management, Metro- 
politan Life Insurance Company, 1 Madison Avenue, 
New York, New York 10010. 

2The authors wish to acknowledge A. Troken- 
heim’s contribution to the early portion of the present 
research. They also wish to express appreciation to 
the Service Bureau Corporation, New York City, 
for their help in securing the pattern analysis. 
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predictors taken singly. Using two trichoto- 
mized MMPI scales in combination, Lykken 
and Rose (1963) were able to make more 
accurate predictions than was possible with 
the conventional multiple-regression formula 
using all 11 predictors. A review of the clinical 
use of the technique of pattern analysis is 
now available (Sines, 1966). 

Of more immediate interest is Sorenson’s 
(1964) investigation. In one of the rare uses 
of the configural approach within industry, 
Sorenson found he could make more accurate 
predictions of sales effectiveness using pat- 
terns of personal characteristics taken four 
at a time than was possible for any weighted 
composite of biographical items taken one at 
a time. In other words, Sorenson’s data sug- 
gest that combinations of personal history 
items can predict a sales criterion more 
successfully than can the classical linear- 
regression technique. Dunnette’s (1966) theo- 
retical model of the ideal personnel selection 
procedure also calls for the use of a combina- 
tion of predictors to select those applicants 
most likely to succeed. 

Since the foregoing literature gave no indi- 
cation of the patterns to be expected from the 
six variables under investigation, the more 
general hypothesis was tested that specific 
patterns would be found which would be 
related to specific levels of production. 


METHOD 
Procedure 


A computer program was used to examine rapidly 
all possible combinations of the 26 intervals associ- 
ated with the six predictive variables (Table 1). 
Sonquist and Morgan’s (1964) program was in fact 
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TABLE 1 
CATEGORIES OF THE PREDICTOR VARIABLES 
Category 
Variable 
1 2 4 5 6 
Age Under 25 yr. | 25-29 yr. 30-39 yr. Over 39 yr. 
Prior weekly Less than $90 |$90-109 $110-129 Over $129 
salary weekly weekly weekly weekly 
Sales experience] Tangible only | Intangible only| Both kinds None 
Marital status | Single Married Other 
Dependents 0 1 eZ 3 4 More than 4 
Education Grade school | Attend high Graduated Attend college | Graduated 
only school high school college 





written especially for the many biographical and 
demographic variables whose intervals consist of 
simple categories or classifications, such as “Married,” 
“Single,” and “Other” for the marital status variable. 
The program did not make any assumptions of 
linearity or additivity. Combinations of intervals 
were reported in the order in which they accounted 
for sales production variance. In effect, the program 
divided the total group of Ss into mutually exclusive 
subgroups of increased homogeneity with respect to 
the criterion. Such partition required at least 25 Ss 
in each group. The minimum sum of squares of 
the group must be at least 1.5% of the total sum of 
squares. In order for a group to be formed, a 
minimum reduction of 5% in the error sum of 
squares also had to occur. The program automatically 
stopped when it was unable to use the remaining 
intervals to divide further any segment of the sample. 

It must be stressed that the computer simply used 
at every point the most important of the remaining 
categories, that is, the one which would bring about 
the greatest reduction in the remaining error sum of 
squares. Statistical significance was not taken into 
consideration. Sonquist and Morgan (1964) observe 
that “It seems unreasonable to apply ordinary sta- 
tistical tests at each split; that is, to insist that the 
split be a statistically significant difference between 
the two means. It is the best of a large number of 
splits at each stage [p. 114].” The present investi- 
gators nevertheless applied statistical tests of signifi- 
cance to the outcome of the pattern analysis to 
determine whether the groups that emerged were 
reliably different from each other on the criterion 
measure, and also to determine whether the program 
‘had succeeded in accounting for a significant amount 
of the total variance. 


Subjects 


The Ss for this investigation were 1,525 male 
agents appointed during the year 1963 by Metro- 
politan Life Insurance Company who survived at 
least four quarters after the time of appointment. 
The biographical information on each salesman was 


obtained from the application blank he filled out at 
the time he applied to the Company. 


RESULTS 


The results of the computer program for 
pattern analysis are reported in Figure 1. 
Prior income and number of dependents 
accounted for virtually all the explicable 
production variance. Prior income was the 
more important variable, and therefore ap- 
peared first. The group of 417 agents who 
reported earning less than $90 prior to their 
appointment averaged only $235,000 in first 
year sales. The average production of the 
remainder of the group was $281,000. None 
of the other variables in the study proved 
capable of further dividing the group of 417 
agents. The remaining 1108 agents were again 









Number of Dependents: 
3 or more (N =135) 
Ay. Production = 
$338,000 


Previous Income: 
$130 Weekly or 
More (N = 208) 
Av. Production = 
$320,000 













Previous Income: 
$90 Weekly or 
More (N = 1108) 
Av. Production = 
$281,000 










Number of Dependents: 
2 or less (N = 73) 
Av. Production = 
$285,000 













TOTAL GROUP: 
(N = 1525) 

Av, Production = 
$268,000 









Previous Income: 
$90-$129 Weekly 
(N = 900) 

Ay. Production = 
$272,000 











Previous Income: 
Under $90 Weekly 
(N = 417) 

Av. Production = 
$235,000 










Fic. 1. Results of pattern analysis of biographical 
predictors of sales production (1963 appointees 
surviving at least 4 quarters). 
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TABLE 2 
BroGRAPHICAL ITEMS AND ITEM CoMBINATIONS Most HiGHLY RELATED TO SALES 
Rank | Group Characteristic (s) Average sales SD N 
1 1 More than $130 prior weekly income and three or more 
dependents $338,000 $156,000 | 135 
2 2 More than $130 prior weekly income and two or less 
dependents 285,000 97,000 73 
3 3 Prior weekly income between $90 and $129 272,000 115,000 | 900 
+ 4 Prior income under $90 weekly 235,000 96,000 | 417 
Total group 268,000 117,000 | 1525 





TABLE 3 


VALUES OF STUDENT’S ¢f FOR AVERAGE PRODUCTION OF GROUPS IDENTIFIED THROUGH PATTERN ANALYSIS 





Group 


1. More than $130 prior weekly and 3 or more dependents 


2. More than $130 prior weekly and 2 or less dependents 
3. Prior income between $90 and $129 weekly 
4. Prior weekly income less than $90 





* p < .05, two-tailed test. 
**  < .01, two-tailed test. 
ek » < .001, two-tailed test. 


partitioned by income into a group of 900 
agents who averaged $272,000 and a group of 
208 whose sales averaged $320,000. The group 
of 208 agents was then split according to 
number of dependents. An extremely high 
producing group of 135 agents emerged, along 
with a more moderately producing group of 
73 agents. The large group of 900 agents 
could not be further divided by any of the 
remaining variables. The two groups formed 
on the basis of number of dependents were 
also not divisible. 

The picture of the high producer given by 
this analysis is that of a man whose reported 
income prior to his appointment as an agent 


TABLE 4 


SuMMARY OF ANALYSIS OF VARIANCE OF PRODUCTION 
oF Groups IDENTIFIED BY COMPUTER PROGRAM 
FOR PATTERN ANALYSIS 


Source of variation df M vist F 
Between groups 3 388,241 2.98* 
Within groups 1521 130,065 


*p <.05. 


Group 1 Group 2 Group 3 Group 4 
a 2a or Spoor ae Oogem 
— — 0.87 3.308% 
ma = oa 4.62*** 





was in excess of $130 per week. This individ- 
ual also reported on his application blank 
at least three dependents. The program iden- 
tified the typical man who reported a prior 
salary of under $90 a week as a low producer. 

Table 2 summarizes the key characteristics 
of the four groups found by the pattern analy- 
sis program. The result of a ¢ test of the 
production differences between these four 
groups is given in Table 3. The computer pro- 
gram appears, for the most part, to have suc- 
ceeded in identifying groups reliably different 
in first year sales production. The analysis of 
variance reported in Table 4 suggests that the 
program also was successful in accounting for 
a significant portion of the total sum of 
squares. 


DISCUSSION 


The general hypothesis of this study was 
confirmed. Distinct patterns were found. 
These patterns accounted for a significant 
proportion of the criterion variance. 

Pattern analysis appears, on the basis of 
the authors’ experience, to be a promising 
technique for the analysis of the kind of bio- 
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graphical variables of interest. If the findings 
survive cross-validation in a new sample they 
will be important considerations in recruiting 
and selection. 

The results of the study are in agreement 
with the literature in that prior salary and 
number of dependents were again found to be 
associated with the sales production of life 
insurance agents. The present investigation 
added the information that prior income and 
number of dependents were sufficiently impor- 
tant in accounting for criterion variance to 
make superfluous any contribution of age, 
education, marital status, or previous sales 
experience. 

No cross-validation was attempted. Since 
the program required groups to be of appre- 
ciable size before partitioning could occur, the 
entire sample was needed for the original 
analysis. It was believed that the small hold- 
out group that could have been spared would 
not reproduce the findings of the major sam- 
ple. A small sample would not provide groups 
of the required size. Later agent groups will be 
used for the purpose of cross-validation. The 
large number of Ss remaining in certain of 
the groups, most notably the group of 900 
agents, makes it desirable to add more bio- 
graphical items to future analyses in order to 
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achieve finer groupings, and hence, more 
sensitive prediction. 
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BRAND AWARENESS: 


DIFFERENTIAL ROLES OF FITTINGNESS AND 
MEANINGFULNESS OF BRAND NAMES? 


RABINDRA N. KANUNGO ? 


Dalhousie University, Halifax 


Three experiments were performed to assess the comparative influence of fit- 
tingness and meaningfulness on the two stages of brand awareness: the response 
learning of brand names and the learning of brand-product association. The 
use of free-recall method in Experiments I and II revealed that only meaningful- 
ness and not fittingness of brand names influences the response learning stage. 
Response learning was better for high meaningful than for low meaningful 
brand names. In Experiment III, an associative matching task was used to study 
the effects of meaningfulness and fittingness on the associative learning stage. 
The results suggest that while both the variables influenced the learning of brand- 
product association, it was fittingness of brand names that served as a better 
predictor of associative learning. Associative matching was better for the fitting 
than for the nonfitting brand names under both high and low meaningfulness 
conditions. Only in the restricted case of nonfitting brand names did meaning- 


fuless influence associative learning. 


Recent studies (Kanungo, 1968; Kanungo 
& Dutta, 1966) have shown that the mean- 
ingfulness and fittingness of brand names are 
potent variables influencing brand awareness. 
In these studies, the meaningfulness of a brand 
name was determined by the mean rating of 
the brand name by a group of Ss on a mean- 
ingfulness scale ranging from most to least 
meaningful categories. The fittingness of a 
brand name was determined by the degree of 
resemblance (in form, sound, and meaning) 
the brand name had to a common associate 
of the product which the brand name repre- 
sented (Kanungo, 1968). With these opera- 
tional measures for the two variables, the 
series of three experiments reported below was 
conducted to determine clearly differential 
roles of meaningfulness and fittingness on 
brand awareness. 

It has been pointed out earlier (Kanungo 
& Dutta, 1966) that an advertisement of a 
product essentially contains a pair of items: 


1The study was supported partly by Grant No. 
X-12-179 from the National Research Council of 
Canada and partly by Grant No. X-84-124 from 
Dalhousie University Research Development Fund. 
The author is grateful to Marcia Earhard for many 
helpful comments. 

2Requests for reprints should be sent to the 
author, Department of Psychology, Dalhousie Uni- 
versity, Halifax, Nova Scotia. 
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the product and the brand name. It is the 
purpose of the advertiser to make the con- 
sumer learn and retain the association between 
this pair of items. The situation, therefore, 
is analogous to paired-associate learning, 
which involves two separate stages: response 
learning and association learning (Underwood 
& Schulz, 1960). In order to learn pairs of 
items, the learner must first acquire the indi- 
vidual items themselves (response or free- 
recall learning), and then he must acquire an 
association between the pairs, so that given 
one of the items of the pair he can readily 
recall the other item (association learning or 
associative hookup). In testing for brand 
awareness, therefore, these two stages must 
be distinguished. It is conceivable that a 
consumer may recall a brand name, but may 
not be able to recall the product it represents. 
In this case, the consumer shows evidence for 
response learning but shows no evidence for 
having learned the brand-product association. 

Earlier studies (Murdock, 1960; Under- 
wood & Schulz, 1960) have suggested that 
response learning of words is linearly related 
to their frequency of usage or familiarity. 
The attribute of meaningfulness has been 
shown to covary with familiarity (Noble, 
1953, 1954). Thus it may be hypothesized 
that learning and recall of brand names alone 
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(without reference to the product they repre- 
sent) will be directly determined by the 
meaningfulness of the brand names. The 
fittingness variable, on the other hand, should 
have direct influence on the acquisition of 
brand-product association because, by defini- 
tion, the fitting brand names have closer 
resemblance with existing common associa- 
tions to the product than the nonfitting 
brand names. 


EXPERIMENT I 


Experiment I was designed to study the 
effect of the meaningfulness variable on the 
response learning (or free recall) of the 
brand names. The brand names were pre- 
sented to Ss in such a manner that the brand 
names appeared to vary only in their mean- 
ingfulmess (high or low) but not in their 
fittingness characteristics. 


Subjects 


Thirty-two undergraduate students (16 males and 
16 females) served as Ss. These Ss had not partici- 
pated in any kind of psychological experiment before. 


Materials and Procedure 


All the 48 brand names selected and reported in 
an earlier study (Kanungo, 1968) were used in the 
present study. These brand names had been selected 
on the basis of their meaningfulness and fittingness 
for 12 different products. Thus for each product there 
were four brand names: high meaningful-fitting 
(HM-F), low meaningful-fitting (LM-F), high 
meaningful-nonfitting (HM-NF), and low meaningful- 
nonfitting (LM-NF). 

Four lists, each containing a different set of 12 
brand names, were prepared from the total 48 brand 
names for the purpose of this experiment. Each list 
contained only the brand names of each of the 12 
products in such a manner as to include 3 HM-F, 
3 HM-NF, 3 LM-F, and 3 LM-NF brand names. 
Eight Ss were randomly assigned to each list. The 
Ss were made to believe that the words contained in 
the list were real brand names used by some adver- 
tisers for their products. They were told that as a 
part of an advertising research E was interested 
in knowing the reactions of Ss to each of the brand 
names. 

Each S was given a list of the brand names in 
random order without any mention of the products 
the brand names might represent. The S was asked 
to rate each brand name for its meaningfulness and 
its appropriateness (or suitability) for being used in 
the advertisements. Twelve answer sheets were pro- 
vided to S, one for each brand name, to record 
his ratings on the two printed 6-point scales of 
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meaningfulness and appropriateness. The 6 points on 
the scales were labeled as follows: very high, high, 
moderate, low, very low, none. The S was instructed 
to rate the brand names by underlining the appropri- 
ate verbal label. The order of presentation of the two 
scales was counterbalanced for each S. To ensure that 
Ss paid proper attention to the brand names in the 
list, each S was asked to write down the brand name 
at the top of the rating sheet. 

Each S was given approximately 1 min. to rate 
each brand name on the two scales. After S had 
finished rating all the brand names, the list and the 
answer sheets were collected. Then S was given 
5 min. to recall and write down as many brand 
names from the list as possible. 


Results and Discussion 


The Ss’ ratings of the brand names on the 
two scales were transferred to ordinal weights 
ranging from 5 (very high) to O (none) for 
the 6 points on the rating scales. Each brand 
name was rated by 8 Ss. Taking the mean 
ratings for each brand name, it was observed 
that the 24 HM brand names had significantly 
higher mean meaningfulness values (mean 
m' = 3.62) than the 24 LM brand names 
(meang7n! =+1.60,;,h—=35.66, p<.001). This 
suggests that Ss did perceive the HM and 
LM brand names selected by £ as differ- 
ing in meaningfulness. The appropriateness 
ratings revealed that the F brand names 
were considered no more appropriate (mean 
appropriateness = 2.62) than the NF brand 
names (mean appropriateness = 2.57, #= 
0.10). This result was expected, in view of 
the fact that the fittingness of the brand 
names for specific products was not revealed 
to Ss. The brand names had been pre- 
sented as discrete words without the context. 
of the product they represented. This indi- 
cated that the experimental attempt to ma- 
nipulate meaningfulness apart from fittingness 
was successful. 

Effect of meaningfulness. An analysis of 
variance was performed on the free-recall 
scores using Treatments X Treatments X Sub- 
jects design (Lindquist, 1953). The two treat- 
ment classifications were HM versus LM and 
F versus NF brand names. Only one main 
treatment effect, HM versus LM brand names, 
was significant (F = 15.41, df =1/31, p< 
.01). The mean recall scores for HM-F and 
LM-F brand names were 1.81 and 1.19, re- 
spectively (¢ = 2.95, p < .01). The mean re- 
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call scores for HM-NF and LM-NF brand 
names were 1.78 and 1.22, respectively 
(t = 2.67, p < .01). These results support the 
contention that the response learning of brand 
names is directly related to the meaningful- 
ness of the brand names. The HM _ brand 
names were better recalled than LM brand 
names in a free-recall situation. 


EXPERIMENT IT 


In order to highlight the isolated effects of 
the meaningfulness variable on the response 
learning stage of brand awareness, the free- 
recall test in Experiment I was administered 
after the brand names had been presented as 
words in a list without the context of the 
products they represented. In practice, how- 
ever, brand names always appear in the con- 
text of the products they represent. Thus 
when a brand name appears in an advertise- 
ment, its meaningfulness as well as its fitting- 
ness for the product become salient for the 
consumer, and the two variables simultane- 
ously influence brand awareness (Kanungo, 
1968). It was not clear from the result of 
Experiment I whether only meaningfulness 
and not fittingness influences response learning 
of brand names in real advertisements where 
both variables are salient. Experiment IT was 
designed to answer this question. 

In addition, Experiment IT attempted to 
replicate the findings of Kanungo (1968) re- 
garding the effect of the utility of the product 
a brand name represents on brand-name 
recall. Unlike the results reported by Kanungo 
and Dutta (1966) using Ss from India, 
Kanungo (1968) reported that the brand 
names for male-use and female-use products 
were recalled equally well by both male and 
female Canadian Ss. However, it was noticed 
that both groups recalled fewer brand names 
of products used by both males and females, 
perhaps due to the fact that these products, 
such as adhesive tape or folders, had less 
personal significance or importance for Ss. 


Subjects 


Twenty-four adult male and 24 adult female 
undergraduate students served as Ss. These Ss had 
not participated in any kind of psychological experi- 
ment before. 
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Materials 


The same materials used and described in detail 
by Kanungo (1968) were used in this experiment. 
Briefly, four commonly used products from each 
of the three product-utility categories (male-use, 
female-use, and used-by-both products) were chosen. 
For each of the 12 products, four brand names, a 
HM-F, a LM-F, a HM-NF, and a LM-NF were 
selected. Using these brand names, four advertise- 
ments for each product were prepared. Twelve copies 
of each of the 48 advertisements were printed on a 
14-cm, X 22-cm. art paper. In each printed layout, 
the brand name appeared at the top of a picture of 
the product. Below the picture, a short phrase con- 
taining the product name was present. Farther down 
the page, the name and address of the advertiser (a 
fictitious manufacturing concern) were given. 

Using these advertisements, 48 booklets were com- 
piled, each containing an advertisement of each of 
the 12 products. Each booklet included a HM-F, 
a LM-F, a HM-NF, and a LM-NF brand name for 
each of the three product categories—male-use, 
female-use, and used-by-both products. The Sequence 
of the 12 advertisements in each booklet was ran- 
domized to ensure varied order of presentation to Ss. 


Procedure 


Each S was given a booklet and was asked to 
rate each brand name contained in the booklet on 
each of two 6-point scales: meaningfulness and 
appropriateness (cf. Kanungo, 1968). The Ss were 
made to believe that the advertisements were genuine 
and that the advertiser wanted to know the reactions 
of Ss to the brand names in terms of the two scales 
before launching an extensive advertising campaign. 

The Ss were given approximately 1-1.5 min. to 
rate each brand name on the two scales, After Ss 
had finished rating all the brand names, the booklets 
were collected from them. Then Ss were given blank 
sheets of paper for the free recall of brand names. 
They were asked to write down on these sheets the 
brand names from the booklets that they could recall 
within a maximum period of 5 min. The free-recall 
test, instead of an aided-recall test where the product 
names are given to Ss (Kanungo, 1968), was ad- 
ministered because the former is a more senstive 
method of assessing response learning of brand names. 


Results and Discussion 


Ratings of brand names. The ratings of the 
brand names on the two scales were analyzed 
in the same manner as in Experiment I. In 
the present experiment each brand name was 
rated by 12 Ss. The Ss’ ratings indicated that 
the 24 HM brand names were significantly 
higher in meaningfulness (mean m! = 2,92) 
than the 24 LM brand names (mean m’ = 
1.99, t = 3.44, p < .01). The appropriateness 
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ratings revealed that the F brand names were 
considered significantly more appropriate 
(mean appropriateness value = 2.81) than the 
NF brand names (mean appropriateness value 
= 1.71, t = 4.07, p < .001). These ratings are 
very similar to those reported by Kanungo 
(1968) and suggest the validity of the experi- 
mental manipulations of meaningfulness and 
fittingness of the brand names. 

Effects of fittingness and meaningfulness. 
A 2 X 2 X 2 analysis of variance (Lindquist, 
1953, Type VI design) was performed on the 
free recall of brand names to assess the rela- 
tive influence of meaningfulness and fitting- 
ness variables. The three classifications were 
meaningfulness (HM versus LM) and fitting- 
ness (F versus NF) of brand names, and Ss’ 
sex (male versus female). Only the main ef- 
fect of meaningfulness was found to be sig- 
nificant (F = 18.88, df = 1/46, p < .01). The 
interaction effects were not significant. How- 
ever, it may be pointed out that the inter- 
action between meaningfulness and fittingness 
variables approached statistical significance 
(=93.48, df = 1/46, 05 > p< .10). The 
reason for this trend in the interaction effect 
can be seen from Table 1, where the mean 
recalls of HM-F, LM-F, HM-NF, and 
LM-NF brand names by all 48 Ss are pre- 
sented. The F ratio for meaningfulness indi- 
cated that the free recall of the HM brand 
names is significantly better than those of 
LM brand names. However, the mean recall 
scores presented in Table 1 reveal that such 
significance stems mainly from the recall of 
NF brand names. Thus when the brand names 
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are of the nonfitting type, the effects of 
meaningfulness on free recall of brand names 
are more clearly observed (¢ = 2.65, p < .01). 
It will also be noticed that the free recall of 
F brand names does not differ significantly 
from those of NF brand names. 

These findings are unlike those reported by 
Kanungo (1968), who used aided-recall pro- 
cedure, in which Ss were presented with the 
product names and then were asked to recall 
appropriate brand names for the products. 
Except for the recall procedures, the materials 
and the method used in both the Kanungo 
(1968) study and the present one are very 
similar. The mean recall scores reported by 
Kanungo (1968) are also presented in 
Table 1, for the purpose of comparison. It 
will be seen from Table 1 that the potential 
influence of fittingness on brand awareness as 
shown by Kanungo (1968) disappeared in the 
present study where the free-recall test was 
used. Since the free-recall test is a test of 
brand-name learning, and not of the learning 
of brand-product association, it may be con- 
cluded that it is the meaningfulness and not 
fittingness of the brand names that determines 
brand-name learning. 

Effect of product utility. The mean free- 
recall scores of the male and female Ss for 
the brand names of the three categories, male- 
use, female-use, and used-by-both products, 
are presented in Table 2. A 2 X 3 analysis of 
variance performed on the free-recall scores 
revealed that only the main effect of the three 
product categories was significant (F = 4.00, 
df = 2/92, p< .025). The reason for the 
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significant effect of the product-utility vari- 
able stems from the fact that Ss recalled rela- 
tively fewer brand names for the used-by-both 
products than for the other two categories 
(see Table 2). 

These results can be compared with the 
results obtained by Kanungo (1968). Table 2 
also presents the mean recall scores reported 
in the Kanungo (1968) study. Two observa- 
tions can be made from such a comparison. 
First, the recall scores obtained by Kanungo 
(1968) are higher than those reported in the 
present experiment. The reason for this ob- 
servation is quite obvious. Kanungo (1968) 
used aided-recall method in which Ss were 
presented with the product names which 
served as cues for recalling the appropriate 
brand names. In the present experiment, Ss 
were required to recall brand names without 
being given any product names to serve as 
cues for their recall. Second, the patterns of 
recall of brand names for the three product- 
utility categories in both the present and the 
earlier (Kanungo, 1968) experiments are very 
similar, despite the differences in their recall 
procedures. In both studies, there was a trend 
toward lower recall of brand names for the 
used-by-both products than for either the 
male-use or the female-use products (see 
Table 2). The lower recall of brand names 
for the used-by-both products might have re- 
sulted from the impersonal nature of the 


TABLE 2 


MEAN RECALL ScorES OF MALE AND FEMALE SUBJECTS 
FOR THE THREE CATEGORIES OF BRAND NAMES 
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Kanungo, 1968 (aided recall) 
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oN = 24 
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products belonging to this category. For in- 
stance, products like adhesive tape, folder, or 
writing pads, used in both the experiments, 
presumably had less personal significance for 
the Ss, and hence they may have been con- 
sidered less important than the products be- 
longing to the other two categories such as 
nylons or men’s shorts. Another additional 
factor that might have produced the lower 
brand name recall for the wused-by-both 
products is exposure in real life to advertise- 
ments for the products used in this study. 
The male- and female-used products such as 
shirts, nylons, girdles, etc., are more widely 
and frequently advertised than products such 
as folders or adhesive tape. This might have 
made Ss more attentive and sensitive to the 
advertisements representing the former than 
the latter type of products. Whether it is the 
personal significance of the product for the 
consumer, or the frequency of exposure in 
real life to the advertisements of the product, 
or both these factors taken together that 
influence brand-name recall is a question that 
calls for further research. 


EXPERIMENT III 


The free-recall method used in Experiments 
I and II revealed the differential effects of 
meaningfulness and fittingness variables only 
on the response learning of brand names, but 
not on the learning of brand-product associa- 
tion. Thus the final experiment was designed 
to assess the comparative influence of the 
two variables on the learning of brand- 
product association. The learning of associa- 
tions between the brand names and the 
products they represent was tested through 
an associative matching task. The Ss were 
provided with both the brand names and the 
product names after exposure to the adver- 
tisements, and were asked simply to pair or 
match the brand names with the products 
they represented in the advertisements. Such 
matching task provided a situation in which 
individual differences among Ss in their 
response learning of brand names was elimi- 
nated (since all the brand names were avail- 
able to Ss), and Ss’ matching performance 
unambiguously reflected their learning of 
brand-product associations. 
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Subjects 


Twelve male and 12 female undergraduate students 
served as Ss. These Ss had never participated in any 
kind of psychological experiment before. 


Materials 


The same advertisements used in Experiment II 
were also used in this experiment. This time, how- 
ever, when the advertisement booklets were com- 
piled, each booklet included 48 advertisements: four 
advertisements of each of the 12 products. The four 
advertisements for each product included one HM-F, 
HM-NF, LM-F, and LM-NF brand name. The 
sequence of the 48 advertisements in each booklet 
was randomized to ensure varied order of presenta- 
tion to Ss. 


Procedure 


Each S was given a booklet and was asked to rate 
each brand name contained in the booklet on each 
of the two 6-point scales: meaningfulness and ap- 
propriateness, in a manner similar to Experiment II. 
After the ratings were completed, Ss were given a 
single serial list of brand names and product names 
listed in random order. This procedure ensured that 
all the responses were available to each S, and it also 
controlled for differences in the response learning of 
brand names among Ss. The Ss were asked to use 
the list to pair each brand name with the product 
name it represented in the advertisements and to 
write down the pairs on a separate sheet of paper. 
A maximum of 10 min. was allowed for this 
associative matching task. 


Results 


Rating of brand names. Each brand name 
was rated on each of the two scales by 24 Ss. 
These ratings were analyzed in the same 
manner as described for Experiments I and II. 
The mean m’ values for HM and LM brand 
names were 2.55 and 1.77, respectively 
(¢ = 2.89, p < .01). The mean appropriate- 
ness values for F and NF brand names were 
2.63 and 1.67, respectively (¢ = 4.30, p< 
.001). These results are very similar to those 
of Experiment IT and they again substantiate 
the validity of the experimental manipulations 
of the two variables: meaningfulness and 
fittingness. 

Effects of meaningfulness and fittingness. 
In order to assess the relative influence of 
meaningfulness and fittingness on the correct 
matching of brand names with the product 
names, a 2 X 2 X 2 analysis of variance Type 
VI design (Lindquist, 1953) was performed 
on the number of correct matchings made 
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by Ss. Again the three classifications were 
meaningfulness (HM _ versus LM _ brand 
names), fittingness (F versus NF _ brand 
names), and Ss’ sex. Both the meaningfulness 
and fittingness variables had significant effects 
(Fs = 74.25 and 142.01, respectively, with 
df = 1/22 and p< .001 in each case). The 
matching scores of male and female Ss did 
not differ significantly (F = 2.14, df = 1/11, 
p > .05). Two of the interaction effects were 
also found to be significant. The F ratios 
for the interaction between meaningfulness 
and fittingness was 56.53 (df=1/22, p< 
.001) and between Ss’ sex and fittingness was 
5.84 (df = 1/22, p < .05). The mean match- 
ing scores of all 24 Ss for HM-F, LM-F, 
HM-NF, and LM-NF brand names are pre- 
sented in Table 1. These means reveal the 
reason for the significant interaction between 
meaningfulness and fittingness. Associative 
matching was significantly better for HM than 
for LM brand names only when the brand 
names were nonfitting (¢ = 11.99, p< .001). 
However associative matching was better for 
F than for NF brand names regardless of the 
meaningfulness of the brand names (see 
Table 1), suggesting that fittingness of a 
brand name may be a better predictor of 
learning of brand-product association. 

A comparison between the matching scores 
of male and female Ss revealed the reason for 
the significant interaction between Ss’ sex 
and fittingness. The difference between the 
scores of male Ss (M = 13.83) and female Ss 
(M = 17.50) for NF brand names was signifi- 
cant beyond the .001 level (¢ = 7.34), but 
for the F brand names the difference between 
the scores of male Ss (M=21.00) and 
female Ss (M = 22.25) was significant only 
at the .05 level (¢ = 2.50). 


DISCUSSION 


The results of the three experiments clearly 
demonstrated the differential roles of mean- 
ingfulness and fittingness of brand names on 
the two stages of brand awareness: learning 
of brand names and learning of brand- 
product association. The results of Experi- 
ments I and II revealed that the major de- 
terminant of learning and recall of a brand 
name (regardless of the product context in 
which it appears) is its meaningfulness and 
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not its fittingness characteristic. The. results 
of the third experiment suggested that while 
both meaningfulness and fittingness of the 
brand names may influence the learning of 
brand-product association, it is the latter vari- 
able that seems to serve as a better predictor 
of the associative learning. Under the condi- 
tions of both high and low meaningfulness, 
the fittingness characteristics of the brand 
names favored their associative learning. 
Only in the restricted case of nonfitting 
brand names did meaningfulness influence 
learning of brand-product association (see 
Table 1). Among nonfitting brand names, the 
brand-product association was formed faster 
if the brand names were of high meaningful- 
ness than if they were of low meaningfulness. 
This finding is consistent with the associative 
probability notion proposed by Underwood 
and Schulz (1960) for paired-associate learn- 
ing. A highly meaningful brand name evokes 
a larger number of associations and hence 
there is a greater possibility that it will find a 
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common link or basis of association with the 
product sooner than a low meaningful brand 
name evoking fewer associations. 
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INTERACTION OF ACHIEVEMENT CUES AND FACILITATING 
ANXIETY IN THE ACHIEVEMENT OF WOMEN’ 


W. J. McKEACHIE 2 


University of Michigan 


In three studies, 250-380 women psychology students low in facilitating anxiety 
(FA) as measured by the Alpert-Haber AAT achieved better grades when 
taught by teachers characterized by expectations of high standards of achieve- 
ment than when by teachers not so characterized, while high FA women per- 
formed more poorly in classes taught with high standards of achievement. 


Alpert and Haber (1960) developed a test 
of anxiety about achievement tests with two 
subtests—one for debilitating anxiety, the 
other for facilitating anxiety. They presented 
evidence that the Facilitating Anxiety Scale 
correlated positively with grade-point average 
(GPA) while the Debilitating Anxiety Scale 
correlated negatively with GPA. 

Use of the Alpert and Haber test at the 
University of Michigan was found to con- 
tribute little beyond standard college aptitude 
measures to the prediction of grades in the 
introductory psychology course. Scores on the 
scale, however, were found to interact con- 
sistently with a measure of achievement cues 
emitted by the instructor in predicting the 
grades of women (not men). The sample 
and procedures used in these studies are 
described by McKeachie, Lin, Milholland, and 
Isaacson (1966). 

Table 1 indicates this interaction for three 
different samples. In each sample women stu- 
dents high in facilitating anxiety did rela- 
tively well in classes low in achievement cues 
while students low in facilitating anxiety did 
relatively poorly in these classes. 

To account for this rather consistent finding 
let us look first at the specific measures of 
achievement cues and facilitating anxiety. In 
the 1961 and 1963 studies the index of 


i The earlier data reported in this study were col- 
lected under a grant from the Fund for Advancement 
of Education. The later data were collected and 
analyzed with support from the United States Office 
of Education, Research Contracts SAE-8541 and 
4/01-001 to W. J. McKeachie, J. E. Milholland, and 
Robert L. Isaacson. Yi-Guang Lin carried out the 
data analysis. 

2 Requests for reprints should be sent to the 
author, Department of Psychology, University of 
Michigan, Ann Arbor, Michigan 48104. 


achievement cues was obtained by computing 
the mean rating which students assigned an 
instructor on the item, ‘““He maintained defi- 
nite standards of student performance.” In 
the 1958 study McKeachie et al. used the 
mean of three items: “Instructor set very high 
standards for students,” “Members of the 
class competed to do well,” and “The course 
work presented a real challenge to me.” 

A typical item on the Facilitating Anxiety 
Scale is “I work most effectively under 
pressure—as when the task is very impor- 
tant.” Students high in facilitating anxiety 
(who answered such questions affirmatively) 


TABLE 1 


INTERACTION OF FACILITATING ANXIETY, 
ACHIEVEMENT CUES, AND GRADES 





oF WOMEN 
Facili- | Achieve- Grades 
Study | tating | ment Total 
anxiety | cues A&BIC,D,&E 
1963 Hi Hi 27 25 52 
Lo 40 ih 67 
Lo Hi 37 23 60 
Lo 20 38 58 
Total 124 113 237 
1961 Hi Hi 55 47 102 
Lo 54 42 96 
Lo Hi 35 40 75 
Lo 40 65 105 
Total 184 194 378 
1958 Hi Hi 50 36 86 
Lo 72 49 121 
Lo Hi 40 43 83 
Lo 41 49 90 
Total 203 177 380 


Note.—Interaction chi-square for combined data = 5.59 
with 1 df; ~ < .02. 
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were expected to do well in classes with high 
achievement cues. Results in the opposite 
direction suggest that the Facilitating Anxiety 
Scale may be a test of general academic 
motivation as much as a scale of test anxiety. 
In short, women who score high on this scale 
may say they are unafraid of tests because 
they study and are prepared to the limits of 
theit ability. Students who score low on the 
scale may simply be unmotivated. When the 
instructor provides additional cues to achieve- 
ment, the unmotivated students begin to work. 
Since grades are relative within a class, the 
unmotivated students win more of the high 
grades. This suggestion is supported by the 
finding that on the Criteria Test of Psycho- 
logical Thinking (Milholland, 1964) and on a 
test of knowledge administered as part of the 
final examination, women high in facilitating 
anxiety did about equally well whether taught 
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in high or low achievement cue sections, but 
the low facilitating tnxiety women more 
nearly approached the achievement of the 
high facilitating anxiety women when in high 
cue sections. 
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RELATION BETWEEN BIRTH ORDER 
AND BEING A BEAUTICIAN 


PHILIP S. VERY anp JOSEPH A. ZANNINI 
Rhode Island College 


The basic premise of the study is that birth order as described by Konig is 
a factor which greatly influences personality, and that personality in turn 
is an important factor in vocational choice. Konig describes the second born 
as being easygoing, seeking harmony and the like; beauticians (NV = 210) were 
chosen as a vocation typifying these traits. Of the 210 Ss, a statistically signifi- 


cant number were second born. 


A survey of the literature on birth order 
and personality indicates that while differ- 
ences do exist, extensive investigations and 
clarification are needed. An excellent review 
of the literature on birth order and its results 
has been presented by Altus (1966). Altus’ 
text contains summaries of studies by Galton, 
Ellis, Clarke, Apperly, Jones, and Roe, who 
all found that there was a significantly greater 
number of firstborn in eminent positions than 
those born later. 

Much less work has been done on personal- 
ity than on intelligence or success primarily 
because of the difficulties of proper measure- 
ment. The “grand old man” of birth order 
and personality research is, of course, Alfred 
Adler. Using clinical study rather than em- 
pirical research, Adler postulated a natural 
rivalry arising between siblings, the firstborn 
being “born into” a prominence in the family, 
the second born and others attempting to 
“overthrow” his position. Feelings of inferior- 
ity generating a drive toward superiority were 
the essential motivating factors in his schema. 
What must be remembered was that Adler 
practiced in Vienna during the Victorian 
period and saw a clientele which consisted of 
many highly repressed, sometimes persecuted, 
overly ambitious people. The picture changes 
somewhat in America, which presents to the 
new-born a more liberal environment and 
certainly one in which the laws of primogeni- 
ture are not nearly so significant as they were 
in Europe during the past half-century. 

In his book, Brothers and Sisters, Karl 
Konig (1963) states that there is a definite 

1 Requests for reprints should be sent to Philip S. 


Very, Rhode Island College, 600 Mount Pleasant 
Avenue, Providence, Rhode Island 02908. 


relationship between personality and birth 
order. Konig’s theory states that there are 
four basic sibling positions; only child, first- 
born, second born, and third born. The triadic 
pattern of first, second, and third born re- 
peats itself through succeeding siblings. The 
firstborn is described as a defender of the 
family’s attitudes, being socially responsible, 
somewhat domineering, ambitious, aggressive, 
independent, and a leader. The second born 
is more casual, leisurely, and harmonious. His 
basic personality pattern does not contain the 
attributes that one expects a leader or a 
defender to possess. The third born is de- 
scribed as overly sensitive, withdrawn, reflec- 
tive, and usually feels lonely and segregated. 
The only child is a combination of the first 
and third born personality traits. 

The purpose of this study is to discover 
whether or not there is a relation between 
birth order, as described by Konig, and being 
a beautician. It is hypothesized that second 
borns will be present in a random sample of 
female beauticians statistically significantly 
greater than expected by chance. Beauticians., 
were chosen to be subjects in this study, for 
it was assumed that a beautician would typify 
the personality traits of a second born. This 
assumption was borne out by both instructors 
and managers of beauty salons, for they all 
emphasized such second born traits as cour- 
tesy, cooperativeness, and self-control. Avail- 
ability and ease in which one could contact 
beauticians was also a factor in selecting 
beauticians for the study. 


MeEtTHOopD 


Subject. The total sample consisted of 210 female 
beauticians from the state of Rhode Island. The 
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beauty salons were selected by numbering two lists 
of salons from the local telephone directory (one 
urban, the other rural). A table of random numbers 
was used to select 60 beauty salons from the total 
listings (85% urban and 15% rural, corresponding to 
the population of the state). 

The mean age of Ss was 25.35 yr. and the stand- 
ard deviation was 6.15. 

Questionnaire. The questionnaire used required Ss 
to indicate their own age, sibling ranking, and sex, 
plus those of their brothers and sisters. The question- 
naire also asked for the year of death of any sibling. 
The questionnaire indicated that only siblings born 
alive were to be reported. 

Procedure. The data were obtained simply by en- 
tering those salons selected and asking those beau- 
ticians there to fill out the questionnaire. 

The questionnaires were then filed into appropriate 
categories of only, first, second, and third born. The 
firstborn was defined as the first of two or more 
children and does not include only children which 
are a class in themselves. As stated above, there is a 
triadic pattern of first, second, and third born posi- 
tions which repeats itself through succeeding siblings. 
Therefore, fourth borns were classified as firstborns, 
fifth borns were classified as second borns, etc. 


RESULTS 


Of the 210 Ss, 15 were only children, 74 
were firstborns, 87 were second borns, and 34 
were third borns, yielding 7.14%, 35.25%, 
41.43%, and 16.19%, respectively. In order 
to obtain an expected proportion of second- 
born females that would be appropriate, the 
data on second-born females, born from 1924 
to 1948 (according to the National Office of 
Vital Statistics, 1959), were used. 

Since size of family is affected by such 
external environmental factors as depressions, 
wars, and the like, it was decided to compute 
what may be called a weighted average of 
the census percentages for second borns. Ordi- 
nary averages over 5-yr. periods from 1924 
through 1948 were computed; then these 
averages were weighted by the numbers in the 
actual sample born in those 5-yr. periods in 
order to arrive at an appropriate expected 
percentage of second borns for theoretical 
comparisons. The final weighted average ex- 
pected value was 33.09%. 

The possible significance of the larger num- 
ber of second borns was tested by using the 
significance of the difference of a sample per- 
centage from a theoretical value. This yielded 
a standard deviation of 3.16%; therefore, 
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41.43% is 2.64 standard deviations above the 
expected value of 33.09 and is thus significant 
at the .01 level. 


DISCUSSION 


Although the results obtained are statisti- 
cally significant, there are many interacting 
variables which seem to confound the re- 
sults and reduce the possibility of even greater 
significance. It was indirectly observed, while 
distributing the questionnaires, that there are 
different motivating factors, which lead a 
female to enter the field of cosmetology. This 
is especially true since it takes relatively 
little formal training and intellectual ability 
to be a licensed beautician. 

Some beauticians have expressed the fact 
that they became beauticians because they 
felt that with a license they could, when 
necessary, work at home and still raise a 
family. There are others who are totally 
engulfed with the idea of being artistic and 
creative as beauticians. And there are status 
seekers who try to capitalize on the artistic 
end of the field and attempt to project an 
image of the beautician as a highly trained 
and skilled professionalist. It may be noted 
here that although the state has only one 
license for beauticians, there are different 
occupational titles, such as cosmetologist, hair 
stylist, and beautician, which are usually only 
used in a status sense. In essence, these 
different attitudes reflect different birth order 
personalities. 

In spite of the overwhelming evidence 
favoring the success of the firstborn, virtually 
nothing has been done in the United States 
to try to provide environmental compensa- 
tion to the ordinal positions. That a child 
should be seriously affected in his life’s goals 
as a result of his order of birth is a possibility 
abhorrent to humanitarian America. Never- 
theless, the weight of research is beginning 
to bear this out. What is needed is continued 
clarification of the reasons for this achieve- 
ment differential. One of the most promising 
avenues is, of course, the study of the per- 
sonality differences and motivation which are 
at least partially responsible for the differ- 
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PSYCHOLOGICAL CONCOMITANTS AND DETERMINANTS 
OF VOCATIONAL CHOICE’ 


KENNETH M. KUNERT 2 


University of California, Berkeley 


The Vocational Life Patterns (VLP) Q Sort was developed and used to investi- 
gate the personality—vocational-choice relationship. Groups of 75 student Ss, 
50 Ss for the initial study and 25 Ss for the cross-validation study, were drawn 
from the schools of law, medicine, theology, and engineering. Results of the 
student Q-sort distributions confirm that the motivational and personality self- 
concept descriptions of the VLP Q Sort do distinguish these different voca- 
tional groups at highly significant levels. Descriptions are presented of the 
different personality—vocational patterns of the four groups. It was concluded 
that the VLP QO Sort was effective in investigating the personality-vocational- 
choice relationship and it was suggested that the procedure provides a means 
to study the more elusive constructs of personality theories. 


For the counseling psychologist the rela- 
tionship between personality and vocational 
choice is of much interest. Ginzberg, Gins- 
burg, Axelrad, and Herma (1951) discussed 
this personality—vocational-choice relationship 
in terms of the individual’s values and goals. 
A little later, Darley and Hagenah (1955) 
envisioned the relationship as embracing one’s 
needs, value systems, and motivations. Roe 
(1956) applied Maslow’s theory of motivation 
in an attempt to unite theory and research 
findings to give evidence for this relationship. 
Super (1953, 1957) presented the well devel- 
oped theory that vocational choice implements 
the self-concept. 

The present research, stimulated by the 
above and similar ideas, has applied the 
Q-sort methodology to the question of 
the personality—vocational-choice relationship. 
This study was undertaken to determine 
whether distinctive personality patterns could 
be found for students committed to the fields 
of engineering, law, medicine, and theology. 


1 This article is based upon a doctoral dissertation 
submitted to the Graduate School of the University 
of California, Berkeley, in 1965. Appreciation is 
expressed to Harrison G. Gough, dissertation chair- 
man, and also to Lyman W. Porter and Gordon H. 
Robinson, dissertation readers, for their help and 
suggestions. Thanks are also given to Quenten Welsh 
for his assistance in executing the computer program- 
ming and analysis. 

2.Now at the Department of Psychology, Univer- 
sity of Detroit, Detroit, Michigan 48221. Requests 
for reprints should be sent to the author at the 
above address. 
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Recently, a few articles have reported using 
the QO sort in this area. Englander (1960, 
1961) had used a variation of the Q-sort 
method and found that the self-concepts of 
elementary teaching trainees did seem to 
be implemented by their vocational choice. 
Morrison (1962) used the Q-sort method to 
find that the self-concepts of nursing trainees 
and of teacher trainees were related to their 
concepts of the nursing and teaching profes- 
sions. In 1963, Neff and Helfand used a Q 
sort to study the meaning of work. They 
found that the work potential of 16 physically 
handicapped persons could be classified by 
comparing their individual sorts with a cri- 
terion sort developed in the study. All of 
these studies employed either a self- and an 
ideal-sort or a self-sort versus a criterion sort. 

The present investigation, however, used 
only the self-concept sort in obtaining its 
data and deriving its results» The Ss described 
themselves through the Vocational Life Pat- 
terns (VLP) Q-sort deck. It was hypothesized 
that the different vocational fields being re- 
searched could be differentiated by the person- 
ality and motivational variables of the VLP 
O sort. 


METHOD 
Subjects 


The Ss were students obtained from the voca- 
tional fields of law, medicine, theology, and engineer- 
ing. They were chosen from these fields since each 
field provides a clear and distinct vocational choice 
and at the same time the fields allow a sampling 
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of the interest continuum which ranges from 
humanitarian interests at one end to things/object 
interests at the other end. Also the selectivity factors 
involved in getting into these fields permits the 
making of the assumption that these Ss would 
show equal enthusiasm and commitment to their 
choices. Seventy-five Ss were sought in each of 
these professional fields with 50 Ss being randomly 
assigned to the original sample and the remaining 
25 Ss being assigned to the cross-validation sample. 
The Ss for each sample were 21-29 yr. Graduate 
students were used in the engineering group so that 
all groups might be of comparable educational levels. 
The theology group combined students from Protes- 
tant and Catholic seminaries to control for differences 
of religious background and of marital status found 
in the other three groups. All Ss were obtained 
from professional schools in the San Francisco Bay 
Area. 


Procedure 


The Ss performed the VLP Q sort in a group 
setting. Each S was given the Q-sort deck contain- 
ing a title card, seven category identification cards, 
and the 70-item statement cards. Each S was also 
given a combined instruction and recording sheet. 
The instructions explained the Q-sort procedure and 
defined the meaning of the seven categories which 
ranged from most characteristic to most uncharac- 
teristic. The Ss were asked to: “Please read these 
statements and then classify them according to their 
importance and accuracy in describing yourself as 
you think of yourself at the present time.” The 
recording part of the sheet contained columns for 
each of the seven categories, each column having the 
appropriate number of spaces in which to write the 
number of the statements placed in the particu- 
lar category. The item statements were to be dis- 
tributed across the seven categories as follows: 
4—9—14_16—_14—_9-4. Items assigned to each cate- 
gory were given the weighted score for that category. 
The weighted scores ranged from 7 to 1 with 7 being 
assigned to the most characteristic category and 1 to 
the most uncharacteristic category and the weights of 
6 to 2 being assigned to the categories in descending 
order insofar as the category was more or less 
characteristic in its designation. 

The 70 items of the VLP Q sort express ideas 
representative of four major topics with 18 subtopic 
areas. A delineation of the topic and subtopic areas 
follows: 

1. The recreational or leisure time topic has for 
its subtopics (a) personal aspects, ic., activities, 
etc., done alone, (b) social aspects, ie., activities 
with others, (c) physical aspects, i.e., sports, hobbies, 
etc. Items representative of this category are 


8 Gratitude is expressed to Harrison G. Gough for 
his direction and assistance in developing the items 
of the VLP Q Sort. His insistence upon refinement 
of items and suggestions in working out the final 
wording of the items were major factors in the 
production of the VLP Q deck. 
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“Sociable, likes to be with people, to attend parties 
and gatherings of friends and acquaintances” and 
“Likes hobbies or leisure time activities with a tech- 
nical aspect, e.g., developing and printing film, 
working with hi-fi equipment, etc.” 

2. The occupational topic divides into (a) socio- 
economic concerns, (b) satisfactions, (c) personal 
orientation toward work, (d) mobility-versatility, 
(e) talents and skills, (f) implications of vocational 
adjustment. This topic is covered by items such as 
“Financial and occupational security is of great im- 
portance” and “Prefers problems which require 
precise, exact, and logical thinking.” 

3. The personality characteristics topic involves 
(a) adjustment, (b) sense of responsibility, (c) self- 
evaluation and self-improvement, (d) values, (e) role 
considerations. The items of this category try to get 
at the personal motivation and dynamics which give 
the individual an orientation toward life, e.g., 
“Takes duties and obligations seriously, accepts 
responsibility for self” and “Aesthetic and cultural 
values are crucial to the development of character 
and personal maturity.” 

4. The interpersonal topic has as its subtopics 
(a) family and home concerns, (b) relationship to 
neighbor, (c) concerns about society, and (d) rela- 
tionships with clients, etc. These ideas are exemplified 
in items such as “The true stature of man is indi- 
cated by his interest in and concern for the welfare 
and happiness of others” and “In vocational planning 
was always able to count on the support and 
encouragement of family.” 

The 70 items are distributed in the deck as follows: 
(a) recreational category, 12 items; (b) occupational 
category, 19 items; (c) personality characteristics 
category, 23 items; and (d) interpersonal category, 
16 items.* 

Studies of the VLP Q-sort deck have shown that 
all but two of the items tend to have a mean place- 
ment in the neutral or middle category with a dis- 
tribution standard deviation of three-five categories. 
Test-retest reliability studies have provided reliability 
coefficents of .74 for females and .77 for males 
over a 2-wk. interval and of .73 for both males 
and females over a 4-wk. interval. Cluster analyses . 
of the deck based on the groups used in this investi- 
gation found one cluster of two items for each of 
the individual groups and one cluster of four items 
when all 343 of the possible Ss for this investi- 
gation were used. Hence 66-68 items operate with 
considerable independence of one another in the deck. 

In the initial study significant differences in item 
placement were determined, following Strong’s (1943) 
method on the SVIB, by testing each group against 
the combined other three groups which were desig- 
nated as the “professional men in general” sample. 
In this manner 50 Ss in one group were compared 
with the 150 Ss in the other three groups. Differential 
item placement was determined by ¢ test using the 
Institute of Human Development t-test program for 


#A full discussion of the development of the VLP 
Q Sort can be seen in Kunert, 1965. 
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the IBM 7090 computer at the University of Cali- 
fornia Computer Center. The initial sample was then 
analyzed in terms of the significant items found for 
each professional group. 

For the cross-validation study, differential Q sorts 
(DQs) were developed from the initial study results. 
The DQs were formed by ranking the items in each 
vocational group in the descending order of the ¢ 
values derived in the initial investigation. The 4 
items with the largest positive ¢ values were then 
assigned to the most characteristic category, the next 
9 items in order to the quite characteristic cate- 
gory, etc., until all 70 items were assigned to the 
categories in accordance with the distribution used 
for the VLP Q sort. The Q sort so developed became 
the modal or DQ sort for that vocational group. The 
Q-sort distributions of the 100 cross-validation Ss 
(25 Ss from each of the four vocational groups) 
were then correlated with each of the four DQs. 
These correlations were treated as the individual’s 
scores on the DQs and analyses were performed to 
determine whether the DQs would differentiate the 
appropriate DQ vocational group from the other 
three groups. 


RESULTS 
Initial Study 


In this investigation each group was com- 
pared with the other three groups and the 
number of differentiating items was deter- 
mined for each group by means of the #-test 
procedure. For a Q-sort deck of 70 items one 
would expect by chance alone that 1 item 
would be significant at the .01 level and 4 
items would be significant at the .05 level. 

The f-test procedure showed that for the 
subsample of law students 11 out of the 70 
items were significant with a p value of .01 
or less. Eight more items (19 in all, 27% 
of the items) were significant with a p value 
of .05 or less.® 


5 Eleven pages comprising Tables A through D 
which give the significant items for each vocational 
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For the subsample of medical students 5 
out of the 70 items were significant with a 
p value of .01 or less, Eight more items (13 in 
all, 18% of the items) were significant with 
a p value of .05 or less. 

For the subsample of divinity and theology 
students 29 out of the 70 items were signifi- 
cant with a p value of .01 or less, while 6 
more items (35 in all, 50% of the items) were 
significant with a p value of .05 or less. 

Finally, for the subsample of engineering 
students 21 of the 70 items were significant 
at the .01 level of significance or less and 
13 more items (34 in all, 49% of the items) 
were significant at the .05 level of significance 
or less. 

In the case of each subsample it can be 
seen that the number of significantly differen- 
tiating items exceeds by a considerable degree 
the chance expectations of one significant item 
at the .01 level of significance and the chance 
expectation of four items significant at the 
.05 level of significance. 


Cross-validation study 


In the cross-validation study the correlation 
between the Ss Q sort and the DQ was 
taken as his score on the DQ. Each S 
obtained a score on each of the four DQs. It 
was hypothesized that the law students would 





group have been deposited with the National Auxil- 
iary Publication Service. Order Document 00195 
from National Auxiliary Publications Service of the 
American Society for Information Science, c/o CCM 
Information Sciences, Inc., 22 West 34th Street, New 
York, New York 10001. Remit in advance $3.00 
for photocopies or $1.00 for microfiche and make 
checks payable to: Research and Microfilm Publica- 
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TABLE 1 
t-TEST VALUES COMPARING PROFESSIONAL Groups Two AT A TIME ON DQ SCALES 





Group On scale for 
1 
1. Law Law 
2. Medical Medical 095 e= 
3. Theological Theological 9.40*** 
4. Engineering Engineering :Gonee 





Group 
| f 2 5 | 4 
4.70*** 13.24*** 2.88** 
LO7Ft* 1.80* 
Thee iu 13.07#** 
S75 eae 10.34*** 





*p = .05 (t = 1.71 with 24 df). 
*k > = .01 (t = 2.49 with 24 df). 
eK > = 001 (t = 3.74 with 24 df). 
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TABLE 2 


CONTINGENCY TABLES FOR PROFESSIONAL Groups ON DQ SCALES 


Law DQ Scale 





Medical DQ Scale 


Group Law MTE Total Medical LITE Total 
Accept 19 15 34 17 16 33 
Reject 6 60 66 8 59 67 
Total 25 75 100 25 75 100 
Theological DQ Scale Engineering DQ Scale 
Group Theological LME Total Engineering LMT Total 
Accept 23 8 31 DD 24 46 
Reject 2 67 69 3 51 54 
Total 25 dis) 100 25 iS 100 


have the highest scores on the law DQ, the 
medical students on the medical DQ, etc. 

A t-test analysis of the scores of the four 
groups taken two at a time on each of the 
DQ scales gave the results found in Table 1. 

In ten of the comparisons the DQ scales 
distinguished the scale appropriate group at 
the .001 level of significance. The law scale 
distinguished the law group from the engi- 
neering group at the .005 level of significance. 
The medical scale distinguished the medical 
group from the engineering group at the .05 
level of significance. 

The DO scales also differentiated individ- 
uals according to their field of study. Analyses 
of these differences were based on frequency 
distributions of the scores and selected cutoff 
points on each of the DQ scales. The distribu- 
tion of scores was analyzed by use of chi 
square. The contingency table data for this 


TABLE 3 


Cxur-SQUARE VALUES OBSERVED BETWEEN 
PROFESSIONAL Groups ON DQ SCALES 





Group 
Group Scale 
MTE | LTE | LME | LMT 
La DOL | 26.20* 
Nitice (M) OM 18.47* 
Theological (T) | DOT 57.99* 
Engineering (E) | DQE 23.67* 





*p < .001 (x? = 10.827 with 1 df). 


analysis are given in Table 2. The results of 
the analyses are presented in Table 3. 

The pattern of scores for each group and 
the differences among the groups are shown 
in Figure 1. All scores were transformed to 
T scores with a mean of 50 and a SD of 10 
in order to make the scales comparable. As is 
evident from Figure 1, not only the scale 
scores but also the pattern of scores are useful 


@- - -@ MEDIC/NE 
[s-« -) THEOLOGY 
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OQL DQT DQE 
Differential Q-Sort Scales 


Fic. 1. Mean score patterns for four student 
groups on DQ scales. 
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in determining one’s personality propensity 
for a particular vocational field. 


DISCUSSION 


The initial and cross-validation studies con- 
firm the ability of the VLP Q Sort to dis- 
criminate students in the fields investigated. 
The initial study found a significant number 
of discriminating items for each group and 
ordered all of the items in a hierarchical pro- 
gression on the basis of the T scores derived 
for each item. The cross-validation results 
showed that the DQ scales developed from 
the T scores of the initial study were able to 
discriminate both the groups and the indi- 
viduals within the groups from one another. 
Thus the hypothesis of the investigation was 
confirmed. 

An added advantage of the Q-sort method 
is that it showed the hierarchy of values 
within the groups as they gave differentiating 
emphases to their particular aims, values, and 
goals. The method also allowed for a wide 
range of individual variability as the DQ for 
each group gave the salient features of the 
group’s personality profile and the individual 
sorts showed how this pattern was integrated 
idiographically within each S’s personality 
pattern. 

Personality descriptions derived from the 
discriminating items of each group are pre- 
sented below. The descriptions, however, are 
relative for they describe each group as it 
appears in comparison with the other three 
groups. To balance out the descriptions, dif- 
ferences between the relative and absolute Q 
sorts for each group will also be presented. 


Law Group 


The law student is concerned with himself, 
his position, and his status. He shows consider- 
able interest and involvement in political af- 
fairs. He sees himself as perceptive and able 
to evaluate people, problems, and issues. He 
appears, however, more concerned with the 
implications of problems than with the people 
involved in them. He prefers to work with the 
intellectual aspects of problems rather than 
with the aspects which require precise and 
logical thinking. It also appears that duties 
and obligations are of secondary concern for 
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him. Religion also seems to hold a minimal 
position in his pattern. Relative to the other 
groups, he finds less meaning and fulfillment 
in his life and in his work. In fine his interests 
are largely intellectual and aesthetic in nature. 

When the law group Q sort is considered 
by itself, duties and obligations are seen as 
important and life is meaningful for him. He 
accepts personal responsibility and is more 
reserved about his perceptive and evaluative 
abilities. He is also less active in expressing 
publicly his beliefs and ideas. Position and 
status are also less emphasized. He is, never- 
theless, confident of his ability to cope with 
life’s problems. 


Medical Group 


Family is of much importance for the 
medical student both in offering support and 
in providing a release from the stress and 
strain of life. He has little interest in being 
an organizer or leader of others, in getting 
involved in political affairs, and in impressing 
others. Indeed, he appears to judge others on 
the basis of surface traits and to relate to 
them in a rather reserved manner. He would 
appear to be quite controlled emotionally. 

He is primarily involved in his work, and 
although he does not like to be overly re- 
stricted, yet he tends toward conservatism and 
is reluctant to try out new ideas, techniques, 
etc., in his work. He relies on logic and reason 
for making his decisions. Physical soundness 
is important to him and he enjoys competitive 
physical activities. In general, he is content 
with himself, his work, and his relationships 
with others. 

When the medical Q sort is taken by itself, 
this student is seen to find life meaningful and 
to take his duties and obligations seriously. 
The importance of family and of hard work 
and constructive achievement are given less 
emphasis. He also sees himself as being less 
conservative and as being freer in relating to 
others. 


Theological Group 


The divinity or theology student places 
great importance on principles and _ beliefs 
both for guiding his daily life and for evalu- 
ating others. He is other-directed and deeply 
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concerned with helping others. He desires to 
correct social conditions and to lead and 
inspire others. He sees himself as being able 
to have anyone as a friend, and he works to 
increase the feeling of well-being both in 
others and in the world. Even his introspec- 
tiveness is used to understand others better, 
and he develops his skills, social and other, to 
be more effective in working with and inter- 
acting with others. 

Yet his principles tend to lead him to be 
over-idealistic and even rigid in evaluating the 
world about him. He does not like all the 
tasks he may be assigned and is less intel- 
lectually involved in those which are less 
appealing. He finds his primary rewards in 
his work and holds family, home, personal 
ambition, and prestige to be of secondary 
importance. 

His intuitive qualities and his contact with 
his emotions, while generally helpful to him, 
also at times tend to interfere with his work, 
especially when he is faced with difficulty and 
frustration. 

When he is seen by himself, the theology 
student appears a little more flexible, more 
alert to prestige and financial concerns, and 
better able to deal with difficulties and frus- 
trations. Also he tends to esteem personal 
freedom more and to put less emphasis on 
the subjective and intuitive approach to life. 
Even he is not particularly fond of partici- 
pating in church clubs, suppers, and fund 
drives. In general, life is very meaningful 
for him, and he has a strong sense of respon- 
sibility for himself, his duties, and his 
obligations. 


Engineering Group 


The engineering student sees himself as 
being intellectually quick, exact, orderly, and 
analytical. He can also integrate material well 
and then present it clearly and effectively to 
others. He thrives on challenging ideas and 
problems and competitively seeks to excel in 
his work. For him, hard work and constructive 
achievement provide the means to attain 
financial and occupational security and thus 
to offer his family the best in life. In all 
of these areas he shows an ingrained sense 
of responsibility. 
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He prefers hobbies requiring technical and 
mechanical skills. He has a preference for 
short stories and magazine articles rather 
than for longer books and serious nonfiction 
as his desire for quickness again seems to 
predominate. 

He is sociable and enjoys being with others, 
but he does not wish to get involved in others’ 
problems, nor in social or political problems. 
He is basically pragmatic with little interest 
in personal introspection or with abstract and 
aesthetic considerations. Religious beliefs and 
church activities occupy but a small part of 
his object-oriented world. His philosophy is 
one of “live and let live,” and thus he avoids 
involvement in the lives and affairs of others 
and in turn expects the same from them in 
his own regard. 

The picture varies little when the Engineer- 
ing Group’s Q sort is considered by itself. It 
shows less emphasis for hobbies requiring 
technical and mechanical skills and less em- 
phasis on the concern for financial and occu- 
pational security. His interest in short stories, 
etc., is more moderate as is his picture of 
himself as evaluating problems quickly. Social 
concerns tend now to fall more in the neutral 
region than in the uncharacteristic region of 
his life pattern. 


REMARKS 


This investigation indicates that ipsative 
measures can discriminate the pattern of pro- 
active personality dynamics operating within 
different vocational groups. Yet, while it has 
provided some thought-provoking results, its 
major value would seem to lie in the fact that 
it has shown that it is possible to tap the ra- 
tional and motivational aspects of people’s 
lives. The Q-sort method joined with a O deck 
developed according to one’s major theoretical 
positions is capable of bringing together the 
rational and empirical approaches to psychol- 
ogy to provide data of both practical and 
theoretical import. With this method it would 
appear that difficult areas of personality 
theory can be put to the test. It might even 
make possible the attainment of a more or 
less all-embracing theory of personality upon 
which a stable, scientific structure can be 
built. 
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EFFECT OF PERCEIVED EXPERTNESS UPON CREATIVITY 
OF MEMBERS OF BRAINSTORMING GROUPS 


PANAYIOTA A. COLLAROS anp LYNN R. ANDERSON 1 


Wayne State University 


Perceived expertise of other members may make a brainstorming group less 
effective than the pooled results of members working alone. 1n an All Experts 
condition, each member of a brainstorming group was told that all other 
members had previously worked in such groups; while in a One Expert con- 
dition it was stated that only one member (unidentified) had this experience. 
No information was given in a Control condition. The Ss felt more inhibited 
in the All Experts than in the One Expert condition which, in turn, had more 
inhibition than the control. Originality and practicality of ideas varied ac- 
cording to the degree of felt inhibition, the Control condition had the highest 
originality and practicality scores, followed by the One Expert condition and 


then the All Experts condition. 


Osborn devised the brainstorming procedure 
to create a free and uninhibited atmosphere 
which would increase the creativity of group 
members. The main features of the brain- 
storming procedure described by Osborn 
(1957) are judicial judgment of ideas is ruled 
out; freewheeling ideas are welcomed; quan- 
tity is wanted; and combinations and im- 
provement of ideas are sought. Osborn’s own 
research, for example, showed that engineers 
were able to produce “44% more worthwhile 
ideas” using a group brainstorming tech- 
nique than when the members worked alone 
using other than brainstorming techniques 
(Osborn, 1957). Osborn concluded that group 
participation under the brainstorming condi- 
tions can improve significantly the creativity 
of the group members. 

One of the essential features of the brain- 
storming technique is that novel, offbeat ideas 
produced by one member of the group can 
suggest even more novel or original ideas to 
another member of the group. This effect 
should be especially evident if the members 
can contribute the unique ideas in a non- 
evaluative atmosphere without fear of censure 
from others. It is this mutual stimulation in 
a nonevaluative atmosphere that could, pur- 
portedly, give a qualitatively different di- 
mension to the group ideas compared to the 
pooled ideas of the individuals working alone. 


1 Requests for reprints should be sent to Lynn R. 
Anderson, Department of Psychology, Wayne State 
University, Detroit, Michigan 48202. 


However, recent research by Taylor, Berry, 
and Block (1958) suggests that group brain- 
storming actually inhibits the creativity of the 
individual. The study was replicated, success- 
fully, by Dunnette, Campbell, and Jaastad, 
in 1963.” 

One plausible reason which might par- 
tially account for the failure of group brain- 
storming to prove superior to pooled individ- 
ual brainstorming is the “self-weighting” ef- 
fect discussed by Kelley and Thibaut (1954). 
According to these authors, members of 
groups should participate and evaluate their 
own ideas according to their felt competency 
within the group. Even in the brainstorming 
group where such effects should be minimized, 
the individual probably will participate to 
the extent that he feels as capable as other 
members and to the extent that he is familiar 
with the brainstorming technique. Although 


2 Although studies by Taylor et al. (1958) and 
Dunnette et al. (1963) found that individuals brain- 
storming in a group were less creative than in- 
dividuals brainstorming alone, the studies do not 
present an accurate test of the brainstorming pro- 
cedure since Osborn devised the brainstorming pro- 
cedure to increase the creativity of groups (groups 
in Osborn’s advertising agency, to be exact). Neither 
study compared the efficacy of group brainstorming 
with conventional techniques of group problemsolv- 
ing. Research which has compared group brain- 
storming to previous, conventional methods of group 
problemsolving, has usually found that, indeed, more 
ideas and more creative ideas are produced in the 
brainstorming groups (Meadow, Parnes, and Reese, 
1959; Weisskopf-Joelson and Eliseo, 1961). 
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the brainstorming instructions direct the 
members to make no evaluative judgments of 
any ideas which are offered, it probably is 
impossible to eliminate covert judgments 
which are made but are not expressed as overt 
criticisms. Consequently, status or compe- 
tency differences among the members prob- 
ably create inhibitions which, in turn, produce 
a decrease in the number of ideas produced 
by the low status or less competent members. 
This inhibition effect may, in part, account 


for the fact that the group brainstorming is: 


usually found to be inferior to the statistical 
pooling of the individual member’s creative 
efforts. 

In the present study the perceived expert- 
ness of the other group members was varied 
in order to examine the effects such percep- 
tion would have upon the “self-weighting” of 
the members’ contributions and, subsequently, 
upon the creativity of the brainstorming 
group. 


METHOD 
Subjects 


The Ss were 240 undergraduate (120 males) stu- 
dents enrolled in classes of introductory psychology 
at a large midwestern university. 


Instructions 


Three inhibition conditions were created in the 
experiment through the manipulation of perceived 
expertness: no perceived experts in the group, one 
perceived expert, and all perceived experts. In each 
condition 80 Ss were assigned to 4-man groups. 
Assignments to the 20 brainstorming groups in each 
condition was random, except that all members of 
the group were of the same sex and were not 
acquainted with one another before the experiment. 

Before each brainstorming session started, Ss 
met with E (the same E ran all Ss), where they were 
given the instruction sheets. These sheets included 
general brainstorming instructions as well as in- 
formation designed to create the specific perceived 
expertness manipulations. 

The instruction sheet for the groups in the All 
Perceived Experts condition included a paragraph 
which read 


Read this carefully as you are the only person in 
this group who has not participated in a brain- 
storming session before. All other members are 
familiar with the procedure to be followed. 


In this condition all four members were Jed to be- 
lieve that the other three members had had some 
previous experience with the brainstorming procedure. 

The instruction sheet for the One Perceived Ex- 
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pert condition included a similar paragraph which 
read 


Only one person in each group has participated 
in a brainstorming session before. The rest should 
read the following very carefully to become 
familiar with the procedure to be followed. 


Again all four members in the group received the 
same instruction sheets so that each one thought 
that there would be one member with brainstorming 
experience in his group. 

The Ss in the Control or No Perceived Experts 
condition received instruction sheets which included 
no information about the other members of the 
group. Therefore, each S perceived the others as 
being equal to himself in brainstorming expertise. 

The instruction sheets in all but the Control con- 
ditions were mimeographed on different colored 
sheets to enhance the fact that differences did exist 
among the members of the group. The instruction 
sheets were collected before Ss began working on the 
brainstorming problem so that members of each 
group could not compare instructions. The £ also 
tried to emphasize the instruction and the expert 
manipulation with such oral comments as “The 
ones who are here for the first time, please read the 
instructions very carefully,” or “If you have any 
questions, ask me now, because the members in your 
group who have experience with brainstorming 
have been instructed not to answer any questions 
during the session.” ; 


Procedure 


After Ss had read the instruction sheets, they 
were assigned to their brainstorming group which 
met in an isolated room. The brainstorming prob- 
lem was taken from a group creativity study by 
Triandis, Hall, and Ewen (1965). It read “How can 
a person of average ability achieve fame and im- 
mortality though he does not possess any particular 
talents?” The Ss recorded their answers on a sheet 
which also contained three 5-step scales which S 
could use to rate (at a later time) the “originality,” 
“creativity,” and “practicality” of each idea offered. 
The groups were given unlimited time to work on 
the problem so that the discussion was terminated 


‘only after the group had no more ideas to offer. 


After the session E collected the answer sheets and 
distributed a postmeeting questionnaire to each 
member. The questionnaire included items intended 
to measure S’s feelings about the brainstorming 
session and his participation in the session. 

The measure of inhibition was taken from S’s 
rating of his own feeling about the amount of 
inhibition in his group and also by having S list 
any solutions to the problem which he did not con- 
tribute during the group discussion. The rationale 
behind this procedure is that since S had unlimited 
time during the group discussion to list his ideas, 
the ideas which he would list after the session may 
have been ideas he had held back during the 
group session. Hence one inhibition score was com- 
puted for each S based on a ratio of the number of 
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ideas offered after the session over the total number 
of ideas listed both during the session and after the 
brainstorming session. This score is based on a ratio 
score found in the study of group creativity by 
Triandis et al. (1965). 


RESULTS 


The effectiveness of the perceived expert- 
ness manipulation can be seen from the items 
on the postmeeting questionnaire which in- 
dicate Ss’ own feelings about the group ses- 
sion. Four items were rated on a 5-point scale 
and were stated as: “Were you reluctant in 
offering an idea for fear of criticism from 
other members?” “Were you at all inhibited 
due to the presence of others who had more 
experience with brainstorming?” “When you 
offered an idea that was ‘way out,’ did you 
sense a certain disapproval from other mem- 
bers, although no overt criticism was ex- 
pressed?” and “Did such fear of possible dis- 
approval from other members make you with- 
hold any ideas?” The analyses of variance 
which were conducted on these ratings (see 
Table 1) showed that the perception of in- 
hibition in the All Experts condition was 
always significantly higher than the perceived 
inhibition in the Control condition; the One 
Expert condition had a moderate amount of 
perceived inhibition which fell between the 
two extreme conditions. In no instance is 
there a reversal from the pattern of inhibition 
which was predicted from the expertness 
manipulation. These data indicate that the 
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expertness of the other group members did 
make the individual member feel reluctant to 
contribute all the ideas which came to mind. 
A direct test of the inhibition manipula- 
tion can also be obtained from a count of the 
number of ideas produced in each of the ex- 
pertness conditions. The Ss in the No Experts 
condition produced an average of 16.06 ideas, 
which is significantly higher (p< .01) than 
the average of 9.25 ideas produced by Ss in 
the One Expert condition. In turn, the mean 
of the One Expert condition was significantly 
higher (p< .01) than the mean of the All 
Expert condition (4.71). The ratio inhibition 
score was computed from a count of the num- 
ber of ideas produced alone in the postmeeting 
session compared to the total number of 
ideas which each S produced (alone plus 
group ideas). This ratio score showed again 
that the expertness factor inhibited the num- 
ber of ideas which Ss were willing to con- 
tribute. The ratio inhibition scores are also 
shown on Table 1, where it can be seen that 
the All Experts condition was significantly 
higher (more inhibition) than the One Ex- 
pert, but this later condition was not sig- 
nificantly higher than the Control condition 
when this particular inhibition score was used. 
Since Ss had rated their own ideas on the 
dimensions of creativity, originality, and prac- 
ticality, it was possible to compare the 
“quality” of ideas produced in each of the 
inhibition conditions. These scores are shown 


TABLE 1 


MErEAN INHIBITION SCORES IN THE VARIOUS EXPERT CONDITIONS 











Postmeeting questions 


“Reluctant” question 

“Felt inhibited” question 
“Sensed disapproval’’ question 
‘Withheld ideas” question 


Ratio score (No. ideas alone/No. ideas alone + ideas 
in group) 


Expert condition 
All Experts One Expert No Expert 
(n = 80) (n = 80) (n = 80) 
212 2.558 1.36 
(.914) (.674) (.580) 
2.41 2.398 1.70 
(.837) (.741) (.879) 
3.07 2.458 1.70 
(.938) (.656) (.819) 
2.978 2.36 1.62 
(.900) (.663) (.722) 
364 .057 .019 
(.256) (.009) (.004) 


Note,—The numbers in parentheses are the standard deviations for the variables. 
« Indicates the mean is significantly higher (p < .01) than the mean in the column to the right. 
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TABLE 2 


RATINGS OF THE IDEAS PRODUCED IN 
EaAcH INHIBITION CONDITION 








Expert condition 


No Experts | One Expert] All Experts 


(n = 80) | ## = 80) | (w = 80) 
Creativity 2.97 3.06" 2.69 
Originality 3.648 3.39% Sela 
Practicality 3.367 Seas 3.08 


8 Indicates the mean is significantly higher (p < .05) than 
the mean in the column to the right. 


in Table 2 and indicate quite clearly the 
detrimental effect of the inhibition manipula- 
tion. The Ss in the Control condition rated 
their ideas higher on practicality and origi- 
nality than did Ss in the All Experts condi- 
tion. The scales were highly intercorrelated 
and since no specific definitions of these items 
were given, the ratings probably represent a 
gross evaluative rating.® 

One final item on the postmeeting question- 
naire provides some additional data regarding 
the effects of the expert manipulation. This 
item reads “As a personal experience, did you 
find the session pleasant?” In the All Experts 
condition the mean rating was 2.17 (on a 
5-point scale) which is significantly lower 
(p < .01) than the mean rating of the One 
Expert condition (2.59) which was, in turn, 
significantly lower (p < .01) than the mean 
pleasantness rating in the Control condition 
(3.81). 


Discussion 


The major problem of this study was the 
effect of perceived expertness of others upon 
individual creativity in brainstorming groups. 
The results showed quite conclusively that 
individuals were reluctant to contribute all of 
their ideas when they were in groups with 
members who were thought to have had 
previous training and experience was the 
brainstorming procedure. Since the creativity 
of the brainstorming group is highly depend- 


3JTt should also be noted that Ss’ ideas were also 
rated by two outside judges using the same scales. 
These ratings were highly correlated with Ss’ own 
ratings (p < .001) and, consequently, Ss’ own ratings 
were used as the criterion scores. 
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ent upon all members in the group contribut- 
ing all possible ideas and solutions to the 
problem, total group creativity will be notice- 
ably lessened when one member of the group 
appears to be more expert than the other 
members. The members of the experimental 
groups specifically stated that they “felt in- 
hibited” during the brainstorming session, 
and they also evidenced this fact by listing in 
the postsession “alone” situation many ideas 
which they had not contributed to the group 
discussion. 

The inhibition which the members of the 
“expert” groups felt may be due to the im- 
plied threat of the more knowledgeable mem- 
bers (Hoffman, 1965) and the possibility of 
censure by these expert members. Although 
the brainstorming instructions specified that 
no criticism was allowed, Ss in the expert 
group did note in the postmeeting question- 
naire (see Table 1) that they “sensed dis- 
approval from other members.” Consequently 
these group members did not contribute a 
large portion of ideas which came to mind; 
thus creativity in the inhibition conditions 
was significantly lower than in conditions 
where members did not feel the threat of 
censure by more knowledgeable members. 

The results of the study confirm the find- 
ings of other investigators regarding the ef- 
fects of status differences and ability differ- 
ences on group problemsolving (Hoffman, 
1965). For example, Torrance (1955) has 
found that the low status person may be in- 
hibited and “go along” with opinions ex- 
pressed by the high status person even though 
he feels his own opinions are better. Mausner 
(1954) also found that Ss conform to their 


partner’s erroneous judgments more often 


when the partner has had previously success- 
ful experience with the task than when the 
partner had had previously unsuccessful ex- 
perience. Our results indicate that, indeed, 
the presence of an expert in the brainstorm- 
ing group made the individual feel inhibited 
and subsequently caused him to contribute 
only a few of his ideas to the group problem. 

The individuals who were brainstorming in 
groups with purported experts also rated their 
groups as being less pleasant than groups 
without expert members. These ratings of 
unpleasantness made by Ss in the “expert” 
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conditions and the fact that at the end of 
the brainstorming session these members still 
had many ideas which were unexpressed un- 
doubtedly reflect Ss’ reluctance to contribute 
all of their ideas. Members of the Control 
condition expressed a much larger proportion 
of their ideas and, subsequently, they felt 
that the group session was much more pleasant 
than did Ss in the “expert” conditions. 

Overall the results of the study indicate 
that social factors inherent in unequal status 
structures within the group are detrimental 
to member creativity even though brain- 
storming instructions are given. Group mem- 
bers feel threatened and inhibited by the 
presence of more knowledgeable members, 
consequently, the less expert members con- 
tribute few of their ideas and suggestions, 
that is, group members “self-weight” their 
own contributions according to their previous 
experience with the brainstorming technique 
(Kelley and Thibaut, 1954). This finding 
may explain, in part, why many studies which 
have compared the creativity of individuals 
who are brainstorming alone and in groups 
unanimously endorse the superiority of the 
“alone” condition. 

The implications of such a finding would 
suggest that careful attention should be given 
to the selection of members for the brain- 
storming group by making previous experience 
as similar as possible. When research involves 
the intact group consisting of members with 
extremely different backgrounds in _brain- 
storming techniques, then effort should be 
made to conceal or minimize these differences 
within the group. 
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VALIDITY OF ESTIMATES BY CLERICAL PERSONNEL 
OF JOB TIME PROPORTIONS 


STEPHEN J. CARROLL, Jr.,1 anp WILLIAM H. TAYLOR, Jr.? 
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This study compares the estimated time allocations of 16 clerical workers with 
their actual time allocations as determined by work sampling procedures 
carried out surreptitiously over a 2-wk. period. The average correlation be- 
tween estimated and actual time allocations for individual workers was .88. 
Only 2 of the 16 correlations were under .80. The biggest difference between 
an estimated and actual time allocation for a particular work activity was 
6% for the group of workers as a whole. This study seems to indicate that time 
estimates from rank and file workers can be accurate enough to be useful in 
employee recruitment, selection, evaluation, training, and compensation. 


Only a few studies have been completed 
which have focused on the accuracy of em- 
ployee estimates of how they allocate their 
work time among various work activities. 
Since time estimates are used in job analysis 
and are certainly the cheapest and simplest 
of the methods used to measure work, the 
validity of the information obtained by this 
method should be of interest to all those 
involved in the derivation and use of job 
information. 

In a study by Stogdill and Shartle (1955), 
estimates of the time spent in various work 
activities made by 34 naval officers were 
compared to a log of time spent in work ac- 
tivities maintained by the officers for three 
days. It was found there was a fairly high 
relationship between estimated and actual 
logged time for specific work activities such 
as talking, reading, writing reports, and 
operating machines. More subjective activities 
such as planning and reflection were less ac- 
curately estimated. Mahoney, Jerdee, and 
Carroll (1963) in a study of 4 managers in 
one company and 28 managers in another 
found that job classifications based on time 
estimates correlated moderately well with job 
classifications based on time allocation as 
determined by work sampling. Finally a study 
of 232 technical men by Hinrichs (1964) 
indicated that estimates of the proportion of 


1 Requests for reprints should be sent to Stephen 
J. Carroll, Jr., Department of Business Administra- 
tion, University of Maryland, College Park, Mary- 
land 20740. 

2Now with the United States Naval Weapons 
Station, Charleston, South Carolina. 
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time spent in communication activities were 
very close (within 5%) to the time propor- 
tions obtained with work sampling. 

The purpose of the present study was to 
evaluate the accuracy of time estimates made 
by a group of lower level workers. Previous 
studies have focused only on higher level 
workers, and the extent to which their find- 
ings may be generalized is still virtually un- 
known. 


METHOD 


At the start of the study, 16 clerical workers were 
asked to estimate the proportion of time each spent 
on various job activities during a routine workday. 
Next an independent observer obtained random 
observations of their work activities 16 times a 
day for a period of 2 wk. These random observa- 
tions were then classified by the observer into 
appropriate work activity categories. The proportion 
of observations falling in each category then formed 
the basis for inferences about the actual time al- 
locations as determined by work sampling. The 
participants were unaware that their work activities 
were being observed since they were not told and 
since the observer regularly spent a considerable 
amount of time in the office. While this procedure 
violates normal work sampling procedure it was 
considered justified in this case. No information 
detrimental to any individual was given to manage- 
ment. 


RESULTS 


Table 1 presents the results of this study. 
Although the differences between the esti- 
mated time allocations and the time alloca- 
tions determined by work sampling are sig- 
nificantly different in a statistical sense (chi- 
square, .01 level), the actual differences are 
not large, being at the most 6% (for machine 
operation). Idle and personal time which 
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TABLE 1 


PROPORTION OF TIME SPENT IN VARIOUS WorK ACTIVITIES AS DETERMINED 
BY ESTIMATES AND BY WORK SAMPLING 











Job time proportions Job time propor- Differences in 
as determined by tions as determined average time 
Work activity work sampling by estimates allocation 
n % % % 
Conversation 246 10.3 Tee 2.8 
Filing 202 8.5 10.8 D3 
Idle and Personal 169 Tall 2.4 4,7 
Machine operation 155 6.5 12.4 5.9 
Mail handling 11 S 8 eS 
Telephone 108 4.6 7.9 os) 
Typing 179 7.5 6.3 132 
Walking 89 3.7 5.0 1.3 
Writing, research and review? 1074 45.1 46.8 07 
Other TS) 3.2 0.0 3.2 
Unknown Ue 3.0 0.0 3.0 
Totals 2380 100.0 100.0 





a These were originally separated in the study but were combined because the observer and Ss had difficulty in differentiating 
among them. 


might have been expected to be the least ac- which the idle and personal underestimation 
curately estimated activity since employees was shifted. The proportion of time taken 
would be most sensitive about this was off up by these two activities together is about 
only 5%. Machine operation might have been 14% when determined by both estimates 
off the most because this was the activity to and work sampling. Attention should be di- 
rected also at the “other” and “unknown” 

TABLE 2 work activity categories, for these categories 

obviously could not be used at all in the 


RELATIONSHIP BETWEEN TIME ALLOCATIONS AS - z - 
time estimates very well. This would mean 


DETERMINED BY ESTIMATES AND BY WORK 








SAMPLING FOR 16 SUBJECTS that the time allocations for the estimates 
would have to differ from the time allocations 
Prplasce Correlation® from work sampling by at least the proportion 
(Pearsonian) of observations classified into these two cate- 
A 30 gories. Keeping this in mind, it appears the 
B 08 two time allocations as determined by two. 
Cc 92 different methods are quite similar. 
D .99 This similarity in time allocations between 
E oe estimated and actual is also true for individ- 
5 "92 uals as well as for the group as a whole. This 
H 97 is indicated in Table 2 which presents the 
I .98 correlation coefficients between estimated and 
i ‘99 actual job activity times for each of the 16 
=P = employees studied. Only two of these correla- 
M 80 ion coefficients are below .80 and only one is 
N 22 not significant at the .01 level. 
15 
.90 DISCUSSION 


The study indicated that time estimates 


Average 88 A 
made by personnel on a lower level clerical 
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job were at least as accurate as those re- 
ported for personnel on several higher level 
jobs. The study also would seem to indicate 
that time estimates from a group of such 
personnel can be of value since they are so 
‘asy to obtain and can be accurate enough 
to serve aS a general guide to the nature of 
the work performed on various jobs, For 
example, such job information would certainly 
prove useful in employee recruitment, selec- 
tion, evaluation, training, and compensation. 


STEPHEN J. CaRRott, Jr., AND WitttAmM H. Taytor, Jr. 
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BAOK STRUCTURE, WORK STRUCTURE, AND TEAM 
PERFORMANCE! 


JAMES C. NAYLOR AND TERRY L. DICKINSON 


Ohio State University 


Three different work structures were examined factorially with two levels of task 
structure and two levels of task organization using two-man teams in a multiple-cue 
inference task in an initial test of the Dickinson-Naylor taxonomy of team perform- 
ance. All teams performed for 200 trials. Task structure significantly influenced 
team achievement, consistency, and matching, while task organization influenced 
only team achievement and matching behavior. Work structure failed to show any 
effect upon performance except in terms of the degree to which team responses could 
be predicted from individual member responses. 


A model of the relevant dimensions of team 
performance has been suggested by Dickinson 
and Naylor (1966) which can be stated as 
follows: Team performance = fo (Task struc- 
ture, Work structure, Communication struc- 
ture). 

In this model, task structure is broadly con- 
ceived as the demand characteristics. of the 
task to be accomplished. Also, it is viewed as a 
“fixed” dimension in the sense that if the 
demand characteristics change, the task itself, 
by definition, is also changed. More specifically, 
task structure is formally defined to be a func- 
tion of the individual and joint demand char- 
acteristics of the separate task components, 
namely, component complexity, component 
organization, and component redundancy. 
Stated formally, Task structure = f; (Compo- 
nent complexity, Component organization, 
Component redundancy). 

Each of the task structure characteristics 
may in turn be defined. Thus the complexity 
of a task component is defined in terms of its 


1 This research is based on the master’s thesis of the 
second author and was supported by Research Grant 
GB-4987 from the National Science Foundation 
awarded to the first author for research on choice be- 
havior in multiple-cue situations. 

2 Requests for reprints should be sent to James C. 
Naylor, Department of Psychology, Purdue Uni- 
versity, Lafayette, Indiana 47907. 
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information-processing and/or memory-storage 
demand requirements. Component organiza- 
tion is defined in terms of similar demands 
imposed by the total task due to the interrela- 
tionships existing between or among task 
components. Finally, task redundancy refers 
to the degree of overlap existing among the 
demands imposed by the several individual 
task components. 

Turning now to the characteristic of work 
structure, Dickinson and Naylor (1966) have 
defined this dimension of team performance 
as the manner in which the task components 
are distributed among team members. Work 
structure, then, may be viewed as a subtask 
work assignment problem in the study of team 
performance. It involves (a) the definition of 
the operations to be performed, (0) the se- 
quence in which these operations must occur, 
and (c) the way in which interaction among 
team members must occur (Naylor & Briggs, 
1965). 

The third and final dimension of team per- 
formance, communication structure, is defined 
in terms of the communication interrelation- 
ships which exist between team members. It is 
exceedingly important to note that in the 
Dickinson-Naylor model the communication 
structure dimension differs from the work and 
task structure dimensions in one very im- 
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portant way—communication structure is 
viewed as being a dependent variable rather 
than an independent variable; that is, given 
any particular combination of work and task 
structure, team members themselves will 
develop (within the limits of that particular 
unique situation) a particular communication 
structure. This point of view is similar to that 
proposed by Faucheux and MacKenzie (1966) 
where they contend that the communication 
structure developed by a team is only one of 
many which could be developed under the ex- 
isting conditions. They conclude that com- 
munication structure is a dependent variable 
intervening between task and behavior. 

The importance of the work and task struc- 
ture variables with respect to communication 
structure is that they are viewed as placing 
limitations on the type of communication 
structure which a team will develop. Thus the 
work structure may be such that several 
individuals are required to perform the same 
subtask. In this situation the potential for 
interaction among team members is at a 
maximum and thus greatly facilitates the 
opportunity for communication. Similarly, 
task structure may also be an influence on 
communication structure. For example, as the 
complexity of the team task increases, the 
number of subtasks a team member can per- 
form may be reduced, possibly leading to 
the need for a highly developed communica- 
tion structure. The organization of the task 
may also be important. At one extreme there 
may be little or no interrelationship between 
the subtask components of any two members, 
thus restricting the necessity for communica- 
tion. At the other extreme, a substantial inter- 
relationship between subtask components may 
mean that intermember communication is 
absolutely necessary for subtask success, result- 
ing in a great deal of pressure on members to 
develop patterns of communication. 

It should also be pointed out that the above 
definition of communication structure involves 
only those communication systems or networks 
worked out by team members and does not 
include aspects of communication among 
members which are intrinsic to the nature and 
design of the task. An example of this dis- 
tinction is a recent study by Williges, Johnston, 
and Briggs (1966). In one condition of that 
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study, team members could obtain information 
about their partner’s activity via a radar 
display while in the other condition they saw 
only their own targets and interceptors. Under 
the present taxonomy, these two conditions 
would be viewed as different work structures, 
and the communication structure would be 
different for the two conditions only if different 
interpersonal communication patterns were 
found to emerge as a function of the work 
structure differences. Incidentally, this ap- 
parently happened in the Williges et al. (1966) 
study—at least in terms of frequency of com- 
munication among members. This would 
again support the notion of communication 
structure as a variable dependent upon work 
and task structure. 

Clearly the adequacy of the Dickinson- 
Naylor model depends upon the extent to 
which it can account for the performance of 
teams. The crucial issues, of course, are the 
task and work structures imposed upon the 
team, since as previously explained communi- 
cation structure is typically not manipulated 
per se but is dependent upon the first two 
dimensions. The purpose of this initial research 
was to examine team performance as a func- 
tion of both the work structure and task 
structure variables. 


MetHop 


Subjects. ‘Two hundred forty female undergraduate 
students in introductory psychology at Ohio State Uni- 
versity served as Ss. A team consisted of two members, 
resulting in 120 teams. The Ss were assigned randomly 
to teams and teams were assigned randomly to experi- 
mental groups. 


Experimental Task 


The experimental paradigm was a standard multiple- 
cue probabilistic inference task (see Dudycha & Naylor, 
1966; Naylor & Schenck, 1968), In this task, Ss are 
shown a series of two-digit numbers (X,; values or cue 
values), one at a time, and asked to make predictions 
(Ys; values) as to what two-digit criterion number 
(Y,; value) is associated with each cue number. After 
each prediction Ss are shown the “correct’’ answer. 
Thus, on each trial 


X, = cue or stimulus value on Trial i 
Ys, = an S’s prediction of criterion on Trial i 
Y., = actual criterion value on Trial i 


For any given S the correlational relationship be- 
tween cue and criterion values is fixed over the n experi- 
mental trials—that is, each of the » pairs of numbers 
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TABLE 1 


STIMULUS, CRITERION, AND RESPONSE DATA IN A Two-Cur, Two-PERSON TEAM INFERENCE TASK 








Actual state of nature Human performance 
Trial Cid oieenbies Correct Individual team member Composite team 
response responses response 
Xi Xe ig Ys, Ysy Ye 
1 
2 
3 
4 
5 
6 | 
n 














represents a point from a bivariate normal frequency 
distribution which describes that specific correlational 
relationship. Thus S’s task is to become what Peterson 
and Beach (1967) would call an “intuitive statistician” 
and learn the underlying relationship between X and Y, 
so that they may predict Y, with increasing accuracy 
as their experience with the task increases. 


Performance Measures 


The linear regression model, when applied to a team 
multiple-cue inference task, provides a number of 
interesting and powerful indexes of both individual 
and team performance (for a detailed discussion of these 
measures see Hursch, Hammond, & Hursch, 1964, and 
Naylor & Schenck, 1965). Consider a team inference 
situation involving two cue variables and two team 
members. Each team member is asked to look at both 
cues and then make his own prediction concerning the 
correct answer on that trial. Following this, the two 
team members are asked to “get together” and come 
up with a composite or team prediction (Y.). Such a 
situation is presented in Table 1. Now for each team, 
let 


Y.= BX + b'2,Xe 
= optimal linear prediction equation (1) 


Ye = Xi + b'e,Xe 
= policy equation of team (2) 


If, after » trials, solutions are found for both Equa- 
tions (1) and (2) and then Y, and Y, values for each of 
the » trials are actually computed in turn, the following 
team performance measures may formally be defined 


1. Re = ry,f%e = system ecology 

2. ta = ry,fe = team achievement 

3. R, = ry,¥- = team consistency 

4. tm = r¥ %- = team policy matching 


5 All correlations are over m experimental trials. 


All of the above measures, with the exception of R. 
which defines the system ecology, are indexes describing 
different aspects of team inference behavior. It is pos- 
sible to obtain numerous other measures concerning 
individual team member performance as well, but this 
paper will examine only team performance under 
various conditions. 


Definition of Variables 


The experiment involved the study of both the work 
structure and task structure variables. However, as was 
pointed out earlier, task structure is itself a function of 
three variables—complexity, organization, and redun- 
dancy. For purposes of the present study, task structure 
was formally defined in terms of the multiple regression 
model as follows!: 


RY XyXyX1X_ = Ky xy xs 
+ (Ry .x,X9°XpX3) (1 or Ry .- X4,X4)- (3) 


Equation (3) may be expressed in words as Task 
structure = Task complexity and Task redundancy 
+ Task organization. 

Thus in terms of the multiple regression model, task 
structure is represented by the multiple coefficient of 
determination which exists when predicting the cri- 
terion from both the separate cues and from all com- 
binations of cues. This term may be decomposed 
algebraically into two subcomponents. The first term 
on the right side of the equality in (3) reflects the 
amount of predictability attributed to the cues as 
individual predictors. It includes both individual cue 
predictabilities (cue complexity) and the intercorrela- 


4 This definition of task structure and the resulting 
method of measuring the relationships existing between 
or among task components can be generalized by re- 
placing all the multiple and multiple partial correlation 
coefficients by multiple and multiple partial correlation 
ratios (Rao, 1965). 
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TABLE 2 
SUMMARY OF EXPERIMENTAL CONDITIONS FOR THE 
TWELVE DIFFERENT GROUPS 
Work Task Task 
Group structure structure organization 
if Model 1 .80 .00 
2 Model 1 40 .00 
3 Model 1 .80 40 
4 Model 1 40 40 
5 Model 2 .80 .00 
6 Model 2 40 .00 
a Model 2 .80 40 
8 Model 2 40 40 
9 Model 3 .80 .00 
10 Model 3 40 .00 
11 Model 3 .80 40 
12 Model 3 40 40 














Note.—Each group consists of 10 two-person teams. 


tions between cues (cue redundancy). In the two-cue 
case it may be written 


Ry .° X1Xe 


= Yaka a Py Xe Ps 27x Xof¥ Xi"¥ Xo (4) 





Dx 


Equation (4) reflects the general interplay between 
component redundancy and component complexity. 
Note that any increase in the redundancy among 
components detracts from the impact of component 
complexity. Given that no redundancy exists between 
task components, Equation 4 reduces to 


Ry. x1X_ = PV Ores 
= sum of component complexities (5) 


The second term to the right of the equality in Equa- 
tion (3) represents task organization. It is the squared 
multiple partial correlation of the criterion with the 
unique combinations of the cues times the residual of 
task complexity, that is, the unique contribution to 
the multiple contributed by the various combinations 
of cues. As a definition of task organization, the term 
represents an independent dimension and thus satisfies 
the demand of the model that organization be a task 
characteristic which the team members must learn 
independently of complexity and redundancy. Any 
value of R’y .x,x9°X,,X_ > 0 will, of course, result in 
an increase in task structure. 

To summarize, using the linear regression model 
to define task structure and its determinants results 
in the following: 


Task structure 


I 


total predictability of 
the criterion using the 
cues both individually 
and in combination 

predictability of the in- 
vidual cues 

Component redundancy = cue intercorrelation 


Component complexity 


ll 
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Component organization = predictability of the cues 
when used in combina- 
tion with each other 


Experimental Design 


The design was a 2 X Z X 3 X 5 repeated-measures 
factorial with two levels of task structure (.80 and .40), 
two levels of task organization (.00 and .40), three levels 
of work structure, and five blocks of 40 trials each. 
Task redundancy was kept at .00 (cues were uncor- 
related) for all conditions. A summary of the design 
is given in Table 2. 

The three work structures differed in the number of 
cues available as source of information to a team mem- 
ber and/or in how many of these cues Ss were instructed 
to use. In Model 1, shown in Figure 1, each S had both 
cues presented and was instructed to predict from both 
of the sets. In Model 2 both sets were again presented, 
but Ss were instructed to predict from only one of the 
sets. Thus, one team member predicted using the X; 
cue and the other using the X» cue, although both cues 
were visible to each S. In Model 3 each team member 
had only one of the cues available for prediction. 

To create task stimuli which would satisfy the speci- 
fied values for task structure (.80, .40) and task organ- 
ization (.00, .40) and would keep task redundancy 
constant at .00, three sets of 200 uncorrelated z scores 
were generated through the use of a computer program 


Model | 


<ohSS = 


Model 2 
s 
es Nee pee: 
POS 
Model 3 


Ys, 
Ye 
Vso 


Frc. 1. Two-member team work structures. 
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(Wherry, Naylor, Wherry, & Fallis, 1965). Two of the 
sets were multiplied together to obtain an interaction 
or cross-product term. Each of the numbers in the four 
sets was then multiplied by the desired regression 
coefficient and summed to obtain the criterion values 
(see Table 3). Using the above procedure, 200 criterion 
numbers were generated from each equation whose 
multiple correlation with all four sets, theoretically, 
was unity, and whose correlation with each cue set 
was equal to the respective regression coefficient. 
Due to rounding errors, however, the desired empirical 
values deviated from those obtained, as shown in 
Table 3. 

Two of the sets, X; and Xe, were the actual cues 
presented to Ss. In all the defining equations X, and 
X2 were weighted equally and the sum of their pre- 
dictabilities of the criterion represented the task com- 
plexity. The interaction term’s (X;Xz2) predictability 
of the criterion defined the level of task organization, 
and the fourth set, X3, served as an error term. The 
sum of Xi, Xe, and X;X2 predictabilities represented 
the total amount of task predictability, that is, task 
structure. 


Experimental Procedure 


A group testing procedure was employed. All 10 
teams for a particular experimental condition were 
tested in the same 2-hr. session. A session involved 200 
trials, where a trial consisted of (a) displaying the 
cue(s), (6) getting individual S predictions of the 
criterion, (c) getting a team prediction, and (d) dis- 
playing the correct answer for that trial. 

The Ss were instructed that each was to make an 
individual prediction of the value of the criterion from 


TABLE 3 


‘THEORETICAL VALUES USED IN GENERATING CUES AND 
THE RESULTING EMPIRICAL RELATIONSHIPS 


Criteria generation equations 


Wey coos -632 Xy + -632 Xe + .000 X 1X2 + 447 X3 
Yep = 447 Xi + 447 X2+ .447 XiX2+ .632 Xz 
Mies a 447 Xy + A447 Xe a .000 XiXe + sith X3 
Y., = -000 X,+ .000 X2+ .632 XiX2+.775 Xs 





Empirical correlations of cues with criteria 





Task predictability 


(2) 
Criterion Xy Bay ek ne 
Empir- | Theo- 
ical retical 
NG -617 | .630 | —.083 781 .800 
Neg .378 | .482 .596 754 .800 
¥.; 442 | .417 | —.111 One 400 
Vi, —.041 | .009 608 370 400 
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the cue(s) available to them and to record their pre- 
dictions on an answer sheet. Following their individual 
predictions, members were then to “get together” with 
their partners and make a composite or team prediction 
on the basis of each partner’s individual prediction. 
The Ss were told that their task was to predict the 
criterion as accurately as possible using the cue(s), and 
that the cue(s) and criterion were related. They were 
told not to expect to predict perfectly, but that with 
practice they should be able to increase in skill. Cue 
sets were presented to Ss on stapled sheets, while the 
criterion numbers were visually displayed from an 
opaque projector. The task was group paced—the 
teams were allowed approximately 40 sec. to make 
predictions and then the criterion was displayed for 
5 sec. before Z proceeded to the next trial. Three sample 
trials were given prior to the start of the actual testing 
session, and a 5-min. rest period was introduced in all 
sessions at the end of 100 trials. 


RESULTS 
A multiple regression analysis of the form 
v. = @+ by-X1 + boeX2 + bi2eX1X2 (6) 


was computed for the 200 team responses for 
each of the 120 teams. This equation was then 
used to obtain Y, values on each of the 200 
trials for that team. Since Equation (6) repre- 
sents a team’s prediction “strategy,” the Y. 
values represent what the team would have 
predicted had they used that strategy perfectly 
(without error) during the testing session. 

A similar analysis was run on the criterion 
data for each group. This equation was of the 
form 


Y, = at dieX1 + DoeXo + dioeX1X2 (7) 


and represented the optimal prediction strategy 
for that experimental condition. The Y, values 
were then computed for each of the 200 trials 
in that experimental condition—these repre- 
sented the optimal prediction for that trial. 
Once these values were computed there were 
four scores available for each S on each of the 
200 trials—Y., Y., Ye, Y,. These were used to 
obtain the performance measures described 
in the previous section. This was done sepa- 
rately for each block of 40 trials for each team. 
All correlations were transformed to Fisher z 
values for use in the analyses of variance. 


Team Achievement 


Significant main effects on team achieve- 
ment were found due to Task structure 
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Frc. 2. Main effects from analysis of variance performed on team achievement (ra). 


(F=45.13; df=1/108; p<.01), Task organ- 
ization (F = 11.86; df= 1/108; p< .01), 
and Blocks (F = 6.34; df = 4/432; p < .01). 
There was also evidence for a Blocks X Task 
structure interaction (F = 3.21; df = 4/432; 
p < 01). 

The three main effects are shown in Figure 
2. Greater absolute achievement occurred 
under conditions of greater task structure; 
that is, those teams performing in an ecology 
which was 80% deterministic did better than 
those teams operating in an ecology which was 
only 40% deterministic. Absolute achievement, 
however, may not be the best comparison to 
employ between these two environments. 
Instead of examining absolute achievement 
it may be more appropriate to examine achieve- 
ment relative to maximum possible achieve- 
ment. It can be demonstrated algebraically 
that R, represents an upper bound limit on 
achievement in a linear prediction task. Thus, 
the ratio r4/R. may be taken as a measure 
of relative achievement. 

With this measure it can be asked whether 
teams having more structured tasks do better 
relative to theoretical maximum performance 
than do teams having less structured tasks. 
Figure 2 also shows these data—the answer is 
obviously the same as it was with the absolute 
achievement measure. 

The significant task organization effect was 
due to a sharp decrement in team achievement 


under conditions of high task organization. 
The presence of interrelationships among the 
task components thus contributed little to 
achievement, and proved to be a somewhat 
complex task property. 

The significant Blocks X Task structure 
interaction was examined and found to be a 
result of more rapid learning for teams operat- 
ing in tasks of greater structure. Indeed, when 
the interaction means were analyzed using a 
Newman-Keuls procedure for ordered means, 
no learning trend was found for teams having 
low task structure. Apparently these teams 
reached a performance asymptote by the end 
of the first block of 40 trials and did not im- 
prove thereafter, while teams under high task 
structure continued to improve through the 
first four blocks. A small but significant decre- 
ment was observed for performance in Block 
5. This decrement has been observed before 
in tasks of this type (Dudycha & Naylor, 1966; 
Naylor & Schenck, 1968) and appears to be a 
motivational effect due to boredom or fatigue. 


Team Consistency 


Only one significant main effect was ob- 
served in the analysis of team consistency data. 
This was due to Task Structure (F = 16.90; 
df = 1/108; p < .01). This effect is shown in 
Figure 3. 

Note that the teams having the most 
structured tasks responded to the situation by 
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performing in a more consistent fashion. These 
teams adopted prediction strategies which 
were substantially more stable than those 
adopted by teams operating in task structures 
which were much less deterministic. This 
finding is also compatible with prior research 
on individual prediction behavior (e.g., see 
Dudycha & Naylor, 1966; Uhl, 1963) which 
has demonstrated that individual perform- 
ance consistency increases as the structure of 
the environment increases. Also worthy of 
mention is the fact that relative to the level 
of task structure (R.), teams under the low 
structure condition actually were more con- 
sistent than were the teams under the high 
structure condition (the ratio r,/R, is shown 
in Figure 3). This finding is also congruent 
with results obtained on individual prediction 
behavior. Thus, a decrease in task structure 
does not, apparently, result in a proportional 
decrease in the consistency of the prediction 
strategy employed by the teams. 


Team Strategy Matching 


The matching index, 7», reflects the degree 
to which the team’s best fitting policy equation 
or strategy “‘matches” the optimal prediction 
strategy. Thus, in a sense, it reflects the 
degree to which the team is performing in a 
truly optimal manner. Significant effects on 
matching behavior were observed due to Task 


800 


@ 
o 
°o 


a 
° 
° 


AVERAGE CORRELATION (7) 
leeds 


o 
> 


4 0 


° 8 
TASK 
STRUCTURE 


BLOCKS 


173 


700 ~\ 
x 

es WS 
I= 600 - 
= SS 
- (Qasr) 
< 600 CONSISTENCY INDEX 
© CONSISTENCY INDEX 
© 400 
© 
< 
ti 

300 
Se 
= g 


[eee pore Ee Been ds 


40 .80 
TASK 
STRUCTURE 


Fic. 3. Task structure main effect from analysis of 
variance performed on team consistency (r,). 


structure’ (f= 15.42; df = 1/108; » < .01), 
Task organization (F = 33.92; df = 1/108; 
p< .01), and Biocks (F = 35.10; df = 4/432; 
p < .01). Several interactions with blocks 
were also significant—Blocks X Task struc- 
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Fic. 4. Main effects from analysis of variance performed 
on team matching (rm). 
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Fic. 5. Blocks X Organization X Task structure 
interaction from analysis of variance performed on 
team matching (7,,). 


ture (F = 6.07; df = 4/432; p < .01), Blocks 
X Organization (P= 8.58; df=4/432; p<.01), 
and Blocks X Organization X Task structure 
(F = 5.50; df = 4/432; p< .01). The sig- 
nificant main effects are shown in Figure 4. 

More efficient matching was obtained under 
high task structure conditions and also with 
tasks having low organization. There was a 
substantial learning effect over blocks with 
the exception of Block 5 which showed a decre- 
ment in performance similar to that observed 
with achievement. Apparently, then, a highly 
structured environment permits a team not 
only to be more consistent in its strategy but 
also to develop a more appropriate response 
strategy—appropriate-in the sense that ‘it 
more closely approximates an optimal strategy. 
Task organization, on the other hand, detracts 
from the ability of teams to develop appro- 
priate strategies. Since task organization is 
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defined in terms of interrelationships between 
or among task components, it would appear 
that when such interrelationships exist the 
difficulty of developing an optimal response 
strategy is substantially increased. 

The significant interactions with blocks were 
all indicative of differential learning rates under 
various conditions. Some were of considerable 
interest. For example, the Blocks X Organiza- 
tion interaction was a result of a noticeable 
learning trend during the last several blocks 
for the high organization groups as compared 
to virtually no learning being observed for the 
low organization groups. These latter groups 
had 7, values near .90 even during the first 
block and showed little increase over that 
value throughout the remaining trials. The 
most logical interpretation of this interaction 
is that (a) strategies based solely on the 
individual cues are adopted very quickly by 
teams and once adopted are systematically 
maintained with little change and (6) strategy 
sophistication based upon complex interrela- 
tionships between cues occurs much later in 
the task. 

This interpretation was further supported 
by the Blocks X Organization X Task struc- 
ture interaction (see Figure 5). Groups having 
no task organization component showed little 
increase in matching ability across blocks, 
regardless of the level of task structure (7m 
was consistently quite high for these groups). 
For groups having a high task organization 
component considerable improvement inmatch- 
ing occurred, with the greatest rate of improve- 
ment being observed with the groups working 
under low task structure. In this latter condi- 
tion, task organization accounted for all of the 
total task structure and indeed, if any match- 
ing was to occur, it had to be on the basis of 
the organizational component of the task. 
Thus, the task organization concept is clearly 
a difficult one for teams to acquire—much 
more difficult than the more straightforward 
individual cue-criterion relationships. How- 
ever, it can be learned and the degree to which 
it is learned appears to be related to the degree 
to which it contributes to total task structure. 
Also, task organization appears to be learned 
only after the more simple relationships are 
mastered. 
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Predicting Team Responses from Member 
Responses 


The team consistency measure discussed 
earlier represents the degree to which a team’s 
behavior may be predicted from the task 
components (cues) and their interrelation- 
ships. It seemed desirable also to examine the 
question of the degree to which team behavior 
could be predicted from a knowledge of indi- 
vidual member behavior. To accomplish this, 
individual member predictions were used as 
predictors of the team response in a multiple 
regression analysis. Thus, there were two 
predictors (two team members) which were 
regressed on the criterion of the composite 
team response. This was done for every team 
in every condition. The resulting multiple 
R’s were transformed into Fisher z values and 
analyzed using an ANOVA design identical 
to that used with the previously mentioned 
performance measures. 

Two significant effects were found. Work 
structure was significant (F= 4.80; df= 2/108; 
p < .05) as was the Blocks X Task organiza- 
tion X Task structure interaction. Team re- 
sponses in Model 1 were most predictable from 
member responses ( = .669), team responses 
in Model 2 were least predictable (R = .528), 
and Model 3 led to intermediate predictability 
(R = .606). A Newman-Keuls test revealed 
the Model 1 versus Model 2 comparison 
to be the only significant difference among 
the three means. An examination of the inter- 
action effect indicated no easily interpretable 
patterns. 


DISCUSSION 


While the findings demonstrating the effect 
of both task structure and task organization 
upon team performance were not surprising, 
it was unexpected to see the lack of influence 
exerted by the work structure variable. On an 
intuitive basis, one would anticipate the dif- 
ferent work structures might, at the very least, 
comprise a very critical variable in those cases 
where task organization was high and perhaps 
a much less important variable under low task 
organization—however, no such interaction 
appeared. Thus, even in those cases where 
each S was able to see both cues, there was no 
facilitating effect regarding sensitivity to the 
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interrelationship which existed between cues 
in the high organization experimental con- 
ditions. 

The fact that work structure failed to influ- 
ence any of the three team performance meas- 
ures—achievement, consistency, and match- 
ing—might, at first glance, appear to be in 
direct contradiction to the research of Leavitt 
(1951), Guetzkow and Simon (1955), Shaw 
(1954a, 1954b), Shaw and Blum (1965), and 
Faucheux and MacKenzie (1966). However, 
this difference is more apparent than real. 
In all of these studies the concept of group 
(or team) structure was defined in terms of 
communication network arrangements between 
individual team members, not in terms of the 
way in which individuals were formally as- 
signed to subtask roles. Thus, these earlier 
studies were either manipulating or examining 
post hoc the influence of what Dickinson and 
Naylor (1966) would define as communication 
structure rather than work structure. The logi- 
cal separation of these two types of structure 
strikes the authors as being of critical im- 
portance. For example, one logical inference 
arising out of the comparison of the results 
of the present study to those of prior re- 
searchers is that communication structure may 
indeed be a much more critical variable in 
determining team performance than is work 
structure. Support for this notion can be found 
in the study of Williges, Johnston, and Briggs 
(1966) who found no significant effect on per- 
formance due to training conditions (a manipu- 
lation that would be classed under work 
structure in the Dickinson-Naylor paradigm) 
but which did find a significant effect due to » 
communication category. However, neither 
that study nor the present research provides 
a clear test of the relative importance of these 
two types of structure. The implication, never- 
theless, seems quite clear and quite important 
for team performance. 

To accuse the work structure parameter 
as being completely without effect is not just, 
since the analysis using individual team mem- 
ber responses as predictors of composite team 
responses did show an effect due to the three 
different models. The composite team decision 
was most related to the predictions of the 
individual team members in Model 1. Since 
both team members had access to and were 
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asked to use the same information in this 
model, team member agreement should be 
highest in Model 1 and therefore (a) there 
should be less conflict in making a composite 
decision and (6) that decision should be fairly 
compatible with each team member’s own 
view. What was particularly interesting, how- 
ever, was the finding that Model 2 team re- 
sponses were less predictable from its member 
predictions than were the composite responses 
in Model 3. In Model 2 each team member 
had access to the information being used by 
his partner but was instructed not to use this 
information in making his own predictions. 
Thus, as in Model 1 a team member is able 
to learn something about his partner’s re- 
sponse system, that is, the process of inter- 
personal learning can take place in Model 2 
just as easily as in Model 1. In Model 3, how- 
ever, there is absolutely no opportunity for a 
team member to learn anything about his 
partner’s response system nor can he use the 
information in the second cue in his own 
predictions since he has access to only one 
cue. Yet one finds that team responses are 
more systematically related to individual 
member predictions in Model 3 than in Model 
2. This implies that team members in Model 2 
had greater difficulty in arriving at a joint 
decision compatible with their individual 
decision than did teams in Model 3. This 
result appears directly in opposition to the 
notion in Hammond, Wilkins, and Todd (1966) 
that interpersonal conflict should be reduced 
in situations where knowledge of how the other 
person is interacting with the environment 
is available to S. 

The most potent variable influencing team 
performance was that of task structure. All 
three performance measures were related to 
this variable. The more structured the task, 
the higher was team achievement, team con- 
sistency, and team matching. These data were 
all consistent with previous findings on indi- 
vidual performance in multiple-cue inference 
situations (e.g., see Naylor & Schenck, 1968). 
Also consistent with prior research was the 
finding that team achievement relative to 
maximum possible achievement tended to 
increase as task structure increased and that 
team consistency relative to environmental 
consistency tended to decrease as task struc- 
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ture increased. This latter result—a consistent 
finding in individual inference research—is 
indicative of the unwillingness of teams to 
perform in a random fashion (have a random 
team strategy) even when they are presented 
with a task environment that is close to being 
a random environment (the low task structure 
conditions). Apparently teams, like individuals, 
will tend to adopt a systematic strategy even 
when that strategy can be of no earthly use 
to them as far as the task is concerned. 

The task organization variable, even though 
its presence added to the overall predictability 
of the task, clearly resulted in a decrement 
in the ability of teams to achieve and to 
develop an optimal prediction strategy (match). 
This seems compatible with the research of 
Briggs and Waters (1958) who found that, 
for individual Ss, tasks having interrelation- 
ships among task components were more 
difficult than tasks in which the components 
were independent. Also, the inference studies 
dealing with the ability of Ss to learn higher 
order stimulus-criterion relationships indicate 
the high difficulty level of this kind of task 
characteristic (Hammond & Summers, 1965; 
Summers & Hammond, 1966). As mentioned 
earlier, however, it was unexpected to find 
that, in the team inference situation, work 
structure failed to become a moderator of this 
outcome. 

Finally, the finding that the learning of 
strategies by teams appears to “progress” 
from individual cue-criterion relationships 
(task complexity) to the more complex inter- 
action relationships (task organization) sup- 
ports Fuch’s (1962) “progression hypothesis” 
concerning the acquisition of skilled perform- 
ance. Teams only began to acquire some skill 
with the organization aspect of the task after 
they had substantially mastered task com- 
plexity (individual cue-criterion relationships). 
This agrees with Fuch’s hypothesis which 
holds that more basic (zero order) functions 
are learned first and that learning then pro- 
gresses in a hierarchical fashion up through 
more complex (higher order) aspects of the 
task. 
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MOTIVATION OF RESEARCH AND DEVELOPMENT 
ENTREPRENEURS: 


DETERMINANTS OF COMPANY SUCCESS? 
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Fifty-one technical entrepreneurs were studied, focusing upon the relationships 
between their motivation and company performance. More specifically, the 
relationships between the entrepreneurs’ need for achievement, need for power, 
and need for affiliation were related to the performance of the 51 small com- 
panies they founded and operated. The results indicate that high need for 
achievement and moderate need for power are associated with high company 
performance. The effects of need for power and need for affiliation on perform- 
ance seem to be derived through their influence on leadership styles. 


In an attempt to associate need for achieve- 
ment (n Ach) and economic development, 
McClelland (1961) looks to the entrepreneur 
as the one who translates n Ach into economic 
development. The entrepreneur in McClel- 
land’s scheme is “the man who organizes the 
firm (the business unit) and/or increases its 
productive capacity [p. 205].” 

The present authors’ aim was to test Mc- 
Clelland’s macro theory of economic growth 
at the micro level of organizational perform- 
ance. The principle interest in considering Mc- 
Clelland’s work stems from his discussions of 
who entrepreneurs are and of their different 
behavioral styles predicted from differences in 
need patterns. McClelland’s underlying as- 
sumption is that entrepreneurs have a high 
n Ach and that in business situations this 
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high n Ach will lead them to behave in certain 
ways and have certain tendencies. 

Based on McClelland’s discussion, the pres- 
ent authors raised the proposition that the 
degree to which an entrepreneur is motivated 
by n Ach directly influences his skill as an 
entrepreneur and consequently his enterprise’s 
performance. The major hypothesis to be 
tested concerns the relationship between an 
entrepreneur’s level of n Ach and his com- 
pany’s performance. 

Schrage (1965), in testing the relationship 
between the entrepreneur’s n Ach and com- 
pany performance, reported that companies 
run by entrepreneurs who have a high n Ach 
tend to have either high profits or losses (+ 
3% of sales), while those run by low n Ach 
entrepreneurs tend to have low profits or 
losses (< 3% of sales). Reanalysis of his 
data by the present authors sheds consider- 
able doubt on the validity of his findings. The 
primary source of doubt was a discrepancy 
between the scores Schrage used for n Ach 
and those subsequently derived when the same 
protocols were rescored by the Motivation 
Research Group at Harvard. The fact that 
his results departed markedly from established 
theory further substantiates this concern. 

In addition to the relationship between 
n Ach and company performance, the au- 
thors were interested in the interrelationships 
among three needs, n Ach, need for power 
(n Pow), and the need for affiliation (n Aff), 
with respect to company performance. n Pow 
is defined by Atkinson (1958) as “that dis- 
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position, directing behavior toward satisfac- 
tions contingent upon the control of the means 
of influencing another person [p. 105].” 

n Aff is concerned with the establishment, 
maintenance, or restoration of positive affec- 
tive relationships with other people, that is, 
friendships. Statements of liking or desire to 
be liked, accepted, or forgiven are manifesta- 
tions of this motive (Atkinson, 1958). Mc- 
Clelland’s (1961) discussion of the joint 
product of n Pow and n Aff in relation to 
dictatorship stimulated this aspect of the in- 
quiry. He found that n Pow was not related 
to economic growth but was related to style 
of leadership. More specifically, the combina- 
tion of a high n Pow and a low n Aff was 
associated with the tendency of a country 
to resort to totalitarian methods as a style 
of leadership. 

The present authors propose that n Ach 
has behavioral manifestations different than 
either n Pow or n Aff in terms of the individ- 
ual’s relationships with people. n Pow and 
n Aff are interpersonally oriented needs. Im- 
plicit in their definitions is the existence of 
other human beings whom the n Pow or n Aff 
motivated individual can influence and con- 
trol, or with whom he can be friends. n Ach, 
on the other hand, seems to be a more in- 
ternalized need. The n Ach motivated in- 
dividual may need other people to help him 
satisfy his n Ach, but the nature of his rela- 
tionship with them, or more appropriately his 
effectiveness with them, will be determined 
by other needs. The authors suggest that 
n Ach is a primary consideration determining 
noninterpersonally related behavior that leads 
to high company performance. n Pow and 
n Aff are primary considerations determining 
interpersonal behavior that affects company 
performance. n Pow and n Aff, then, can be 
looked upon as having strong implications 
as determinants of management style. 

Numerous other attempts have been made 
to identify those personality traits which 
differentiate leaders from nonleaders or effec- 
tive leaders from ineffective leaders. These 
studies have, in general, failed to find any 
consistent pattern of differentiating traits. 
In a broad sense, the present research is 
analogous to these prior efforts in that it seeks 
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to explain company performance on the basis 
of certain personality characteristics of the 
president. Steps were taken, however, in an- 
ticipation of two potential problem areas: (a) 
that personality description and measurement 
themselves are not yet adequate; (0) that 
the groups studied have usually been markedly 
different from one another and this may have 
concealed a relation between personality and 
the exercise of leadership that would have 
appeared within a more homogeneous set of 
groups or situations. 

The major personality variable of interest 
in the present study is the need for achieve- 
ment. On the basis of the existing body of re- 
search, McClelland’s version of the Thematic 
Apperception Test (TAT) was deemed a 
reliable means of measuring n Ach (Atkinson, 
1958; McClelland, 1961). With respect to the 
second problem area, a very homogeneous set 
of groups has been examined, thus mitigating 
the potential influence of the “situation.” 

For these reasons, the focus in this study 
was upon the new, small, technically based 
enterprise. The entrepreneur president of 
such a company has placed himself in a 
situation where his n Ach, to the extent that 
it exists, can readily be translated into con- 
crete behavior. He starts the company, hires 
the people, and motivates them, sells, plans, 
takes risks, and so on. It is his personality 
and motivation that mold the company in its 
every aspect. Furthermore, in such situations, 
the entrepreneur’s efforts and decisions are 
likely to be very important in determining the 
initial success of the venture. 


METHOD 


Fifty-one small technically based companies in 
the Boston area comprised the sample. All were at 
least 4 but Jess than 10 yr. old at the time of the 
study and all were “spin-offs” from one of the 
Massachusetts Institute of Technology research lab- 
oratories or industrial laboratories around the Boston 
area. They ranged in business activities from service, 
such as computer software development, to manu- 
facturing, such as special purpose computers and 
welded modules. Company and _ entrepreneurial 
personality information were gathered from the 
entrepreneur president. The typical entrepreneur, 
based on the central tendencies for the total sample 
of entrepreneurs, was approximately 36 yr. of age 
when he started his new enterprise, was educated 
to the master’s degree level, and had considerable 
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TABLE 1 
Means, MepiAns, AND RANGES OF 
VARIABLES MEASURED 

Variable M Mdn Range 
n Ach 5.9 5.0 —5 to 18 
n Pow 9.7 9.5 0 to 19 
n Aff 3.5 3.0 0 to 16 
Growth rate 40 375 0.0 to 2.10 





experience at a technically advanced research lab- 
oratory prior to starting his new enterprise. Among 
the information gathered were company yearly sales 
figures and scores on McClelland’s version of the 
TAT for each entrepreneur. The yearly sales figures 
were used as the basis for determining the growth 
rate, defined in detail below. The index of perform- 
ance was derived from the growth rate. The TATs 
were scored for n Ach, n Pow, and n Aff by the 
Motivation Research Group at Harvard University. 
The resulting scores were the basis for analysis 
of the strength of various needs in relation to per- 
formance. 

Growth rate is defined as follows: annual increase 
in the logarithm of sales volume between the second 
and most recent year reported. For example, Com- 
pany A is 7 yr. old. Its second-year sales were 
$100,000 and its last year (seventh) sales were 
$950,000. These two sales values are plotted on semi- 
log paper. The growth rate is indicated by the 
percent rate of change from year to year. This is, 
of course, constant over the 7 yr. The growth rate 
in this case would be approximately .56. Table 1 
summarizes the general characteristics of the four 
variables with which this paper is concerned. 

The method of analysis in all cases was a com- 
parison of high, moderate, and low groups. Equality 
of sample size, within the limits of tied observa- 
tions, was the criterion used in making these group- 
ings. Standard correlational techniques were feasible 
in many cases, and, where appropriate, coefficients 
are presented in footnotes. However, since such tech- 
niques often mask nonlinear trends in relationships, 
the Mann-Whitney U test, one of the most powerful 
of the nonparametric statistical tests, was used. 
Furthermore, correlation techniques focused on dif- 
ferences between two variables based on individual 
differences from case to case. On the other hand, 
the Mann-Whitney U, a difference in medians test, 
analyzes differences between characteristics of groups 
of data. The authors feel that TAT scoring pro- 
cedures are not yet precise enough to enable re- 
searchers to use individual differences as the basis 
for comparison. 


3 Average intercoder reliabilities of scores from the 
Motivation Research Group are in the high .80 
range. 
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RESULTS 


Analyses of the relationship between the 
three needs, n Ach, n Pow, and n Aff, and 
their relation to company performance are 
presented in this section. In addition, some 
exploratory results will be presented that 
focus on the question: Is there a pattern or 
combination of needs which are related to 
high company performance? In other words, 
one set of analyses will focus on the direct 
relationship between performance and vary- 
ing degrees of strength in a single need, while 
a secondary focus will explore effects of 
several needs taken together on company 
performance. 


Relationship between the Three Needs 


The data in Table 2 suggest that, within 
this sample, n Ach, n Pow, and n Aff are not 
completely independent.‘ n Ach appears to be 
positively related to n Pow and negatively re- 
lated to n Aff, while n Pow is negatively re- 
lated to n Aff. It is important to note, how- 
ever, that in all cases the relationship is non- 
linear. In the case of n Ach versus n Pow, for 
example, only the low n Ach group has a sig- 
nificantly different n Pow score. No differ- 
ences in n Pow are observed when a com- 
parison of the high versus moderate n Ach 
groups is made. A similar phenomenon is 
present in each relationship. In other words, 
the correlation coefficients reported in Foot- 
note 4 are heavily influenced by a small 
subset of the total distribution of need scores. 

With these qualifications in mind, it is 
concluded that the three needs are moderately 
related. Where the relationship between each 
need and company performance is examined, 
an attempt will be made to take into account 
this lack of independence. 


Need Strength versus Company Performance 


The major hypothesis in this study pre- 
dicts a direct and positive relationship be- 
tween an entrepreneur’s n Ach and the per- 
formance of his company. No directional hy- 


4The following are the Kendall Tau correlations 
between three needs (two-tailed test). n Ach versus 
n Pow: T=.370, p< 01, N=51. n Ach versus 
n Aff; T=—.259, »~< 01, N=51. n Aff versus 
n Pow: T==.233, p < 05, N= 51. 
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potheses were specified concerning the rela- 
tionships between n Pow, n Aff, and com- 
pany performance.°® 

Referring to Table 3, it can be seen that, 
within the range of moderate to high n Ach, a 
very marked positive relationship exists be- 
tween n Ach and company performance. The 
growth rate of those companies led by entre- 
preneurs with a high n Ach was almost 250% 
higher (.73 versus .21) than those companies 
led by entrepreneurs with a moderate n Ach. 
Here again, however, the relationship is not 
purely linear since the low n Ach group has a 
mean performance score slightly higher than 
the moderate n Ach group but still signifi- 
cantly lower than high n Ach group. 

n Pow, as can be seen from Table 3, is 
completely unrelated to company perform- 
ance. n Aff, on the other hand, exhibits a 
mildly negative, nonlinear, relationship to 
company performance. The data were then 
examined to see if the observed relationship 
between n Ach and n Aff influenced the rela- 
tionship found between n Ach and perform- 
ance. No such contamination was found. Of 
those who were classified in the low n Aff 
group (7 = 13), only six fell into the high 


5 The following are the Kendall Tau correlations 
between the three needs and company performance 
(growth rate). n Ach versus performance: T= .15, 
p < .08, N =51 (one-tailed). n Pow versus perform- 
ance: T= .05, » < 64, N=51. n Aff versus per- 
formance: T= —.11, p < .28, N=51. 


TABLE 2 
RELATIONSHIP BETWEEN THE THREE NEEDS 





Group 
Need |—— 
High Moderate Low 
n Ach 
>9 n 4>xX<8 n <6 n 
n Pow | 11.3 (A) | 14 11.1 (B) 19 6.8 (C) 18 
n Aff 2.0 (D) | 14 3.9 (E) 19 4.4 (F) 18 
n Pow 
> 13 oS 0S 12s <7 n 
n Aff 2.8 (G) | 15 3.0 (H) 19 | 4.7 (I) 17 























Note,—Mann- piney U test results: A versus B, p < .60; 
A versus C, p <.003; B versus C, » <.007; D versus E, 
bp <.13; D versus F, p < .02; E versus F, > < 45; G versus H, 
p <.60; G versus I, » <.11; and H versus I, p < .09 
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TABLE 3 
RELATIONSHIP BETWEEN N Acu, N Pow, AND 
N AFF AND GROWTH RATE 
aa Mean 
Need Whitney Strength growth 
U results*® rue 
A High (> 9) ato 
iN = 14 
n Ach B Moderate (4 > X < 8) 2A 
Neal O 
Cc Low ( < 3) 36 
N = 18 
A High ( > 13) 38 
Nee 
n Pow B Moderate (8 > X < 12) 47 
N = 19 
Cc Low (< 7) 36 
Nets 
A High (> 4) 33 
Vie——20) 
n Aff B Moderate (2 > X < 3) .30 
Viens 
¢ Low (< 1) .67 
N35 














* Results of Mann-Whitney U tests: n Ach versus growth 
rate: A versus B, p < .0001; A versus C, » < .006; B versus C, 
p <.08, one tailed. n Pow. versus Growth rate: A versus B, 
pb <.80; A versus C, » < .90; B verus C, » < .80, two-tailed. 
n Aff versus growth rate: A versus B, p <.81; A versus C, 
p < 16; B versus C, p < .10, two-tailed. 


n Ach group. n Ach, in other words, directly 
affects company performance, independent of 
its relationship to n Aff. 

The results of this section are summarized 
graphically in Figure 1. The percentage of 
companies within each subgroup (high, mod- 
erate, low), whose performance is above that 
of the median for the total sample of 
entrepreneurs, is plotted for each of the needs. 
Seventy-nine percent of those companies led 
by entrepreneurs whose n Ach was high had a 
growth rate which was above the median for 
the total sample of entrepreneurs. 


Joint Products of Needs versus Performance 


The previous section focused on variations 
in company performance resulting from each 
of the three needs (n Ach, n Pow, and n Aff) 
taken singularly. The aim in this section is to 
explore the question of whether or not any 
pattern of need strengths appears to be as- 
sociated with high company performance. In 
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Fic. 1. Percentage of companies above median growth rate for total sample. 


examining the data, it was noticed that, in 
addition to the very wide differences in com- 
pany performance noted between high, mod- 
erate, and low n Ach groups, there existed 
substantial variations in company perform- 
ance within each of these three groups. In 
other words, although the high n Ach group 
exhibited very high performance in com- 
parison with the moderate and low n Ach 
groups, the range of performance scores within 
the high n Ach group was from .14 to 2.10. 
Similar within-group ranges were observed in 
the other two n Ach groupings. 

An attempt was made, therefore, to de- 
termine whether these within-group variations 
could be attributed to variations in the 
strengths of the other two needs being in- 
vestigated, n Pow and n Aff. The authors 
have further split the samples into high versus 
low performers (at the median performance 
score within each n Ach group) and compared 
levels of n Pow and n Aff within each of 
these new subgroups. 


The following patterns emerge from the 
data summarized in Table 4. Within the low 
n Ach group, variations in performance are 
unaffected by variations in n Pow or n Aff. 
Within the moderate n Ach group, n Pow is 
identical for high versus low performers, while 
high performers within this group have a sig- 
nificantly higher n Aff. Finally, within the 
high n Ach group, n Aff is identical for high 
versus low performers, while high performers 
within this group have a significantly lower 
n Pow. 

In summary, the highest performing com- 
panies in this sample were led by entrepreneurs 
who exhibited a high n Ach and a moderate 
n Pow. Those entrepreneurs who had a high 
n Ach coupled with a high n Pow performed 
less well than their high n Ach counterparts 
who exhibited only a moderate level of n Pow.® 


6 When the authors use the phrases “moderate 
n Pow” or “high n Pow,” they are using as their 
reference point the distribution of scores observed in 
this study sample. Their specification, for example, of 


MOotTIVATION OF ENTREPRENEURS 


183 


TABLE 4 
RELATIONSHIP BETWEEN PERFORMANCE AND N Pow AND N AFF WITHIN HiGH, MODERATE, AND Low n AcH Groups 





High n Ach (> 9.0) 


Moderate n Ach (4 > X < 8) 


Low n Ach (< 3.0) 





Performance 
Need 
Low High Low High Low High 
(<.59) (>.59) (<.13) (ez) (<.26) (>.26) 
Nai Nie N=9 N = 10 NG N= 
n Pow 13.1(A) 9.4(B) 11.0(C) 11.0(D) 7.0(E) 6.7(F) 
n Aff 2.0(G) 2.0(H) 2.2 (I) 0) 4.9(K) 4.0(L) 





Note.—Mann- Whitney, U test (two- pated) # versus B, p < .08; C versus D, » < .40; E versus F, » <.50; G versus H, 


pb <.50; I versus J, p <.02; K versusL, p <. 

Within the moderate n Ach group, higher per- 
forming companies were led by entrepreneurs 
who had a high n Aff. 


DISCUSSION 


The major hypothesis tested in this study 
predicted a positive relationship between an 
entrepreneur’s level of n Ach and his com- 
pany’s performance. The authors’ findings 
strongly support the conclusion that high n 
Ach is associated with high company perform- 
ance, but the relationship between n Ach and 
performance is not linear across the entire 
range of n Ach scores. The relationship is 
markedly linear for the entrepreneurs whose 
n Ach is moderate to high. However, these 
entrepreneurs who scored low in n Ach were 
not significantly lower performers than those 
whose n Ach was moderate. 

In an attempt to explain this nonlinearity 
it seems reasonable to assume that other needs 
or factors are influencing the entrepreneurial 
behavior of individuals who are not moderate 
to high in their level of n Ach. It is extremely 
likely that some threshold level of n Ach is 
necessary before one could assume that the 
strength of the need is significantly affecting 
the individual’s behavior. In addition, it is 
obvious that the authors do not see n Ach as 
being the only (or for that matter the most 
important) factor that influences company 


high n Pow as being > 13.0 was made prior to the 
analyses under discussion in this section. Conse- 
quently, classification of a mean n Pow of 13.1 as 
high and a mean n Pow of 94 as moderate is 
consistent with their a priori definitions. 


performance. They are arguing, however, that 
where the need exists in sufficient strength to 
influence entrepreneurial behavior  signifi- 
cantly, company performance in general will 
improve. 

A secondary aim in this study was to ex- 
plore the question of whether a certain pat- 
tern or combination of needs was most often 
associated with high performance. In the in- 
troduction to this paper, it was suggested 
than n Pow and n Aff were needs whose be- 
havioral manifestations were interpersonal in 
character. Satisfaction of these two needs, by 
definition, involves relationships with other 
people. n Ach, on the other hand, is much 
more individualistic in character. Satisfaction 
of one’s n Ach, although often involving con- 
tact with other people, has behavioral mani- 
festations which are qualitatively different in 
nature than either n Pow or n Aff. 

The results of this study suggest that the 
combination of a high n Ach and a moderate 
n Pow characterizes the highest performing 
companies in the sample. In other words, a 
high (as opposed to moderate) level of n Pow 
appeared to counterbalance to some extent 
the positive benefits of a high level of n Ach. 

One possible explanation for this finding 
lies in the relationship between n Pow and 
various styles of leadership. The lower an 
individual’s n Pow, the more permissive or 
laissez-faire his style of leadership, the higher 
his n Pow, the more autocratic or authori- 
tarian his style of leadership. The middle of 
the n Pow spectrum represents a mixed in- 
fluence of the two extreme styles which is 
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best described as democratic.’ Prior research 
(Lippitt & White, 1958) has suggested that 
in certain situations the most effective leader- 
ship style is democratic and that performance 
of groups controlled in this manner is better 
than that of groups controlled by either of 
the other two styles. 

Somewhat more difficult to explain is the 
finding concerning the positive differential ef- 
fect on company performance, within the 
moderate n Ach group, of a high versus low 
n Aff level. It may be that for those indi- 
viduals who have only a moderate level of 
n Ach, a high level of n Aff enables them to 
form close interpersonal relationships with 
their colleagues. In this way, the moderate 
n Ach individual may be able to acquire the 
assistance he needs from his colleagues, some 
of whom may well have a higher level of 
n Ach than he himself has. 


7The authors have assumed, of course, that high 
n Pow leaders are more likely to exercise an auto- 
cratic style of leadership and low n Pow leaders a 
laissez-faire style. 
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Interpretations in this area of need com- 
binations must be viewed, at this point, as 
speculative and suggestive of further research. 
Analysis of the results of this study indicates 
that more complex relationships do have to 
be examined if a realistic view of perform- 
ance determined by personality is to be 
gained. Future research should include repli- 
cations of this study and the use of larger 
samples for the investigation of these hy- 
potheses. 
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A new index, rp, is proposed for the evaluation of ratee relevance in peer-nomination 
data. This index is shown to be a continuous linear function of the observed and 
random expected variance. The use of the rp index for editing, weighting, diagnosing, 


and evaluating saliency is discussed. 


A basic assumption that has been made by 
investigators using peer-nomination techniques 
is that the nominations or ratings they obtain 
veridically reflect the status of the ratees. This 
assumption was brought into serious question 
by Passini and Norman (1966). In the 1966 
study Passini and Norman factor analyzed 
peer-nomination data that had been obtained 
from complete strangers. The factor structure 
that emerged from this analysis was highly 
similar to the factor structures that had 
emerged from previous analyses using the 
identical instrument with individuals who were 
well acquainted. As a tentative explanation of 
these rather surprising results the authors sug- 
gested the concept of shared implicit personal- 
ity theories among raters. They said: 


. . . if we accept the position that each rater brings 
to the situation an implicit personality theory which in 
certain aspects is similar to that of the other persons in 
the group and if observable features of the dress and 
manner of the participants are sufficient to provide an 
entree to one or more components of each of these com- 
mon attribute clusters, then the interrater agreement 
and factorial structure obtained in the present study 
begins to seem a little less incredible [Passini & Norman, 
1966, p. 48]. 


Granting the explanation offered by Passini 
and Norman, the problem that remains is to 
determine the degree of veridicality in any set 
of ratings of ratees. In 1966 Norman and Gold- 
berg confirmed the fact that the same factor 
structure obtained from well-acquainted Ss 
could be obtained when the raters had abso- 
lutely no contact with the ratees. They used a 
Monte Carlo technique which simulated ratee- 


1 This study was supported in part by Research Grant 
MH 07195 from the National Institute of Mental 
_ Health, United States Public Health Service, Warren 
T. Norman, project director. 
2 Requests for reprints should be sent to Frank 
T. Passini, who is now at the University of Colorado, 
Cragmor Road, Colorado Springs, Colorado 80907. 
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rater independence while preserving the con- 
cept of raters having a shared conception of 
trait organization. The main thrust of Norman 
and Goldberg’s effort, however, was the de- 
velopment of two criteria for the evaluation of 
the extent of ratee relevance in peer-nomination 
data. 

These two measures were dubbed “‘score 
reliability” (r,) and “rating reliability” (r;,). 
Score reliability is an index derived from a 
comparison of the expected random variances 
in a set of ratings and the obtained variance. 
Rating reliability is computed from (r,) by 
reversing the generalized Spearman-Brown 
formula. Both reflect the degree of interrater 
agreement in the data. Norman and Goldberg 
(1966) applied these techniques to the Monte 
Carlo data in addition to four sets of empirical 
data. The obtained values were seen to reflect 
the length of acquaintanceship and intimacy 
among Ss used; that is, values of near zero 
were obtained for the Monte Carlo data (no 
acquaintanceship), while high values were 
obtained from the data generated by Peace 
Corps trainees. 

The formula the authors gave for computing 


; V; 
Ys ist, = 1— v where V, = random expected 
oO 


variance and V,= observed variance. It 
can be seen readily that when there is perfect 
agreement among the raters as to a particular 
ratee’s status the observed variance will be 
zero, and the defining expression of r, becomes 
undefined. 

Because of this discontinuity and the po- 
tential difficulty of interpreting a nonlinear 
function of observed variance, the present 
authors would like to suggest another index, 
Ip, aS a replacement for r;. The authors will 
attempt to show that this index is more directly 
interpretable as an index of interrater agree- 
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ment and in addition can readily be used to 
edit and weight data obtained from peer- 
nomination formats as described above. Also, 
the use of rp as a diagnostic tool will be dis- 
cussed briefly. 

As Norman and Goldberg (1966) have 
pointed out, the estimate of ratee relevance is a 
function of the relationship between random 
variance and observed variance as long as 
independence of rating is insured. The proposed 
new index, rp, is also a function of these two 
values. Thus, rp is defined as: 


where V’, is the variance of a row (the nomina- 
tions for a single ratee), and V’; is the random 
(expected) variance of any single row. It can 
be seen that this index will vary from a maxi- 
mum of 1.00 when there is complete agreement 
among the raters about a certain ratee to a 
minimum of —.50 when there is maximal dis- 
agreement among the raters for groups of size 
3x + 1 (see Figure 1).3 

The interpretation of positive values of rp 
offers no problems ; however, the interpretation 
of negative values warrants a few remarks. 
Negative rp values indicate that there is more 
than just chance disagreement among the raters 
about the status of a particular ratee on a 
scale. This disagreement could arise from at 
least three sources: (a) The scale has been so 
constructed that it is capable of diametrically 
opposed interpretations; (6) subsets of raters 
have taken the scale and fitted it into their own 
idiosyncratic antagonistic implicit personality 
theories; (c) the ratee behaves toward different 
subsets of raters in a diametrically opposed 
manner. 

If an investigator is interested in how traits, 
as measured by scales, cluster, then he can in- 
crease the effectiveness of his clustering pro- 
cedures by editing out ratees who have a nega- 
tive rp value. Any such editing, of course, will 
affect the generalizability of the analysis re- 


3 Tt should be noted that under conditions of forced 
nomination as previously described, V’; is a constant for 
groups of the same size. For all groups whose sizes are a 
multiple of (3x + 1), the value of V;, is a constant. 
For groups whose sizes are a multiple of 3% or (3x + 2), 
the value of V’; approaches this same constant as a func- 
tion of group size (see Figure 1). 
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Fic. 1. Slope of the rp function for groups of 
various sizes and slope correction. 


sults and may substantially reduce the ms on 
which correlations may be computed for pur- 
poses of cluster analysis or factor analysis. 

A not so apparent use of the rp index is its 
application as a weighting factor to ratee 
scores. The rationale for this use is as follows: 
One would ordinarily be most certain of a 
ratee’s status with respect to any given trait 
when all his peers agree as to his status on that 
trait. If one is to employ estimates of status on 
a trait, for whatever purpose, it might be rea- 
sonable to weight those estimates by a function 
of the certainty upon which they are based. 
Thus, rp as an index of ratee relevance could 
be used as a certainty weighting factor. 

The use of single r,’s as tools for editing or 
weighting factors has been discussed thus far. 
Before pointing out another use of rp, the 
statistic 7, must be introduced: f, is the mean 
of the r,’s for a given scale and as such is a 
measure of average scale relevance. This inter- 
pretation of 7, follows from the fact that r, is a 
linear function of V’, (Figure 1) and acceptance 
of rp as an index of the scale’s relevance for the 
individual ratees. 

As an example of the use of 7, and rp, sup- 
pose one has a scale that has a low 7, value but 
a single high rp value. Given a scale to measure 
subtle variable x which is related to deviant 
behavior y, a high rp value coupled with a low 
fp value would be highly diagnostic for the 
individual. Conversely, if developing an instru- 
ment with new scales, a low f, with a few high 
rp’s would be diagnostic for the scale. That is, 
intensive study of the individuals with high 
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rp’s could serve to highlight those characteris- 
tics to which the scale is mainly sensitive. 

The above use of f, suggests an additional 
use for this index, that is, the evaluation of 
scale saliency across rating groups. There are, 
however, some problems in combining /,’s. 
First, it should be recognized that the range of 
fp will vary as a function of 7. This variability 
of range, however, does nothing to the quality 
of scale values for groups of the same size. 
There is a further problem in combining #f,’s 
from groups of different sizes in that the slope 
of the r, function varies with group size. A 
correction factor based on the Pythagorean 
theorem may be applied to equate coefficients 
for different groups. 

The intention of the correction factor is to 
equate the slopes of the r, functions for varying 
size groups. Any slope could be selected as a 
standard, but for simplicity the slope of a group 
of size 3x + 1 is selected as the reference (see 
Footnote 3). The correction formula is 


f V (Von)? ai (100 1 Tpr)” 
°° V(W..)? + (100 = fp.) 
where fp, is the corrected index, f,, is the ob- 
served fp, V’o, is the reference observed vari- 


ANCe, Ip is the reference rp, V’o, is the obtained 
V’. corresponding to fp,. Note fp, = fp,. The 


Tp, = 
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correction factor appearing in the formula is 
the ratio between square roots of the sum of 
the squared sides of two similar triangles. This 
correction factor will equate all observed 7,’s 
to a common standard and therefore allow 
direct combining for purposes of comparison. 
For positive values of = when group size ex- 
ceeds 12, the correction would seem unneces- 
sary due to the very small magnitude of change. 
Two examples of the correction are illustrated 
in Figure 1 for groups of six and eight with an 
observed f, of .10 (see Figure 1). 

In view of the linearity of the r, function 
and the utility of the r, index for editing, 
weighting, diagnosing, and evluating saliency, 
this index should be of considerable use in the 
development of new peer-nomination instru- 
ments. In addition, the r, function should 
serve as a useful adjunct in the evaluation of 
data obtained from peer-nomination instru- 
ments in current use. 
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SELF-ESTEEM AS A MODERATOR IN VOCATIONAL CHOICE: 
REPLICATIONS AND EXTENSIONS? 


ABRAHAM K. KORMAN ? 


New York University 


The purpose of the research reported in this paper was to test in different 
types of vocational choice situations the hypothesis that self-esteem operates 
as a moderator on the vocational choice process in that high self-esteem (HSE) 
individuals are more likely to seek self-fulfillment than are low self-esteem 
(LSE) individuals. Four separate studies were all supportive of the proposition. 


There are now several studies in research 
literature which support the hypothesis that 
the extent to which individuals choose careers 
which are need-fulfilling and those in which 
they believe they will be adequate is a positive 
function of the self-esteem of the individual 
(Korman, 1966, 1967a). Such differential 
choice patterns have been hypothesized to 
result from tendencies toward “balance” 
where individuals who perceive themselves 
as need-fulfilling and adequate (ie., have 
HSE) choose vocational roles where they will 
have their needs fulfilled and will be adequate. 
On the other hand, situations of self-perceived 
need-fulfillment and adequacy are not “bal- 
anced” situations for those who have LSE; 
hence they do not serve as incentives for them. 

Since the implications of this hypothesis, 
should it continue to be supported, have con- 
siderable importance for counseling processes 
as well as theoretical significance, it was felt 
that further testing of the proposition was 
desirable. A number of further studies were 
undertaken in order both to replicate these 
previous findings using different instruments 
and to extend them to different dimensions of 
the vocational choice process. It is the purpose 
of this paper to report these studies. 

Study 1 consisted of testing the hypothesis 
that HSE individuals who enter a given 
occupation are more likely to describe them- 
selves according to generally given stereo- 


1Studies 2, 3, and 4 in this paper were presented 
at the meeting of the American Psychological As- 
sociation, Washington, D. C., 1967. 

2 Requests for reprints should be sent to the 
author, Department of Psychology, New York Uni- 
versity, 21 Washington Place, New York, New 
York 10003. 


types of that occupation than both LSE 
people who enter that occupation and a 
random sample of those who enter differ- 
ent occupations. A second prediction was 
that there would be no difference between the 
latter two groups. In addition, it was felt 
that this hypothesis would hold for either 
specifically defined occupational choices (e.g., 
sales and accounting, as in our previous re- 
search) or grosser defined choice (e.g., busi- 
ness management in general). This latter 
aspect constitutes an attempt to extend pre- 
vious findings. ; 

The specific predictions which were made 
(in line with the above) were as follows: 
(a) HSE individuals in sales were most likely 
to describe themselves as being “sociable,” 
“talkative,” “aggressive,” and having “in- 
itiative”; (6) HSE individuals in accounting 
were most likely to describe themselves as 
being “precise,” “self-controlled,” “organized,” 
and “thorough”; (c) HSE individuals in gen- 
eral business were most likely to describe 
themselves as being “practical,” “rational,” 
and “responsible.” 

Study 2 also attempted to generalize the 
previous research to those whose occupa- 
tional choice was more generalized in nature 
in that the interest here was in those whose 
occupational choice was “the world of busi- 
ness” rather than the more specific roles of 
sales, accounting, personnel, etc. The predic- 
tion was that HSE individuals who had 
chosen business as a career would be different 
from HSE individuals who had chosen some- 
thing other than business as a career in the 
direction of having greater need for material 
security and less need for social service. On 
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the other hand, there should be no difference 
on these dependent variables between LSE 
individuals who had chosen business and LSE 
people who had not chosen business. 

Study 3 consisted of a replication of Studies 
1 and 2 in the area of “numerical abilities.” 
It was predicted that individuals of HSE 
who had chosen numerically-oriented occupa- 
tions (accounting and/or statistics) would see 
themselves as having higher numerical abilities 
than HSE individuals who had not chosen 
numerically-oriented occupations. On _ the 
other hand, such differences should not exist 
for those of LSE. 

Study 4 proceeded from the assumption 
that the desire to engage in what is perceived 
to be ethical behavior is relatively widespread 
in nature, at least on a conscious self-descrip- 
tive basis. Hence, it was hypothesized that 
for individuals of HSE, the perceived ethi- 
cality of the behavior to be engaged in would 
be predictive of job choice, whereas such 
predictions would break down for those of 
LSE. More specifically, it was predicted that 
HSE individuals who had chosen business 
occupations would rate various business be- 
haviors as being more ethical than HSE 
individuals who had not chosen business oc- 
cupations. On the other hand, for LSE in- 
dividuals, there would be no relationship be- 
tween the judged ethicality of business be- 
haviors and occupational choice. 


METHOD 
Sample 


(a) The Ss for Study 1 consisted of male students 
at a far western state university. Of these, 22 had 
made career commitments to marketing and sales, 
35 to accounting, 36 to general business, and 42 to 
some area unrelated to business. (b) The “business 
career” sample for Study 2 consisted of 65 male 
upper division business school majors at a different 
far western state university who indicated a specific 
commitment to enter the world of business, but 
who had not chosen a specific career such as sales, 
accounting, etc. The “nonbusiness career” sample 
consisted of 58 upper and lower division male 
students in a number of different major areas at 
two eastern universities. Most prominent of the 
major areas were education, the social and biological 
sciences, and social work. (c) The sample for Study 
3 consisted of 67 lower division students at a 
private eastern university who had made a definite 
occupational choice. (d) The sample for Study 4 
consisted of 53 males and 29 females. Since previous 
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work by the author has shown a sex difference in 
the ethical judgments used in this study, sexes were 
analyzed separately. 


Measuring Instruments 


(a) Self-esteem in all studies was measured by 
the Self-Assurance Scale of the Ghiselli Self-Descrip- 
tion Inventory, with the cutoff in all cases for “high” 
and “low” self-esteem the 50th percentile on the 
nationwide norms. (b) The tendency to describe 
oneself according to certain adjectives in Study 1 
was measured by the Gough Adjective Checklist 
(Gough, 1952). (c) The need for “material security” 
and “social service” in Study 2 were measured by 
the scales of the same name of the Crites Vocation 
Reaction Survey (Korman, 1966). (d) Self-perceived 
“numerical abilities” were measured by the Ability 
Assessment Questionnaire. This is an instrument of 
self-perceived abilities described by Korman (1967a). 
(e) “Occupational choice” was measured by ques- 
tionnaire procedures found in previous research to 
have high reliability and concurrent validity (Kor- 
man, 1966). (f) Judgments of the “Ethicality of 
Business Behavior” in Study 4 were measured by 
having Ss rate, on a 4-point scale, the ethicality of 
25 incidents which have actually occurred in the 
business world in recent years. 


Procedures 


The general procedure in all cases was to ad- 
minister the questionnaires in regular class meetings 
or as part of introductory psychology research par- 
ticipation requirements. No systematic difference has 
been found between any of these procedures. 


RESULTS 
Study 1 


Table 1 presents the results from this study, 
showing in all cases strong support for the 
hypothesis. The HSE individual does, in all 
cases, describe himself more as meeting the 
occupational image in the specific occupation 
than does the LSE individual in the occupa- 
tion or a random sample of individuals who 
have made different occupational choices. 
Furthermore, this occurs no matter which 
occupation is referred to. 

There was one possibly contaminating fac- 
tor to these results, that, perhaps, the higher 
the self-esteem, the higher the frequency of 
words used to describe oneself. In other 
words, in a free response situation, such as 
the Gough Adjective Check List, where the 
person is asked to describe himself according 
to a set of adjectives with sheer number of 
adjectives chosen uncontrolled for, perhaps 
the only difference between HSE and LSE 
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TABLE 1 


ADJECTIVE SELF—DESCRIPTIONS ACCORDING TO 
OccUPATIONAL CHOICE AND SELF-ESTEEM 











Random 
sample 


Low self- 


Item 
esteem 


esteem 





High self- | 





Frequency of choosing ‘‘Practical,”’ ‘‘Rational,’’ 
and “Responsible” 








1. Business 2. Business 3. Nonbus 
majors majors majors 
M Die 1.90 Daley 
SD .60 84 94 
N 15 21 42 
Ciesla Ones 
le & Sin. .92 


Frequency of choosing “Precise,” ‘“Self- Con- 
trolled,” “Organized,” and ‘“Thorough”’ 


il, AXecie 2. Acct. 3. Nonacct. 
majors majors majors 
M oS 1.33 1.60 
SD LDH 1.53 1.43 
N 14 21 42 
1&3= 1.79* 
le & — .69 





Frequency of choosing ‘Initiative,’ “Aggres- 
sive,” “Sociable,” and ‘“Talkative” 





1. Sales 2. Sales 3. Nonsales 
majors majors majors 
M 2.43 1.77 1.78 
SD 1.34 1.29 1.13 
N 9 13 42 
tt &3 = 3.617* 
to&3 = .03 





Note.—All tests in this table are one-tailed tests. 


people is in the total number of words 
chosen. A check for this, using frequency of 
choosing a random sample of 30 adjectives, 
indicated that this was not the case. The 
mean frequency of choice was exactly the 
same for the two self-esteem groups, carried 
out to one decimal place. Hence, this cannot 
explain the results, 
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Study 2 


Table 2 presents the results of this investi- 
gation, with strong support once again being 
indicated for both hypotheses. The HSE busi- 
ness group is significantly higher than the 
HSE nonbusiness group on “material secu- 
rity” and significantly lower on “social 
service.” For the LSE groups neither result 
occurs, with the means actually reversed for 
“material security.” 


Study 3 


Table 3 presents the results for this study 
with strong support once again. HSE “quanti- 
tative occupation” individuals see themselves 
as having greater numerical abilities than HSE 
“nonquantitative occupations,” whereas the 
differences are not significant for the LSE 
groups. 


TABLE 2 


NEEDS FOR MATERIAL SECURITY AND SOCIAL SERVICE 
ACCORDING TO SELF-ESTEEM AND 
VOCATIONAL CHOICE 


























It High self- | Low self- | High self- | Low self- 
ord esteem esteem esteem esteem 
Material security 
1. Business |2. Business ce ae ate 
M 7.59 6.08 6.07 6.61 
SD SZ 3.6 o3 3.2 
N 29 36 27 31 
EiiSe gels 
to&4= 65 
Social service 
: 3. Non- 4. Non- 
1. Business |2. Business scales oes 
M 4.13 5.53 6.89 6.55 
SD 2.9 3.6 4.4 4.2 
N 29 36 27 31 
bSage—12.05** 
to & 4 = 1.09 


Note.—All tests in this table are one-tailed tests. 
*p < .05. 
> < ,01, 
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TABLE 3 


SELF—PERCEIVED NUMERICAL ABILITIES ACCORDING 
TO SELF-ESTEEM AND VOCATIONAL CHOICE 








1. High 2. Low 3. High 4. Low 
self-esteem | self-esteem| self-esteem | self-esteem 





quantita- | quantita- | nonquanti-| nonquan- 
tive tive tative titative 
M 13.00 10.56 8.69 9.52 
SD tet 253 2.9 Bal 
N 7 16 23 21 
bec gi, 99" 
to be lal 








Note.—All tests in this table are one-tailed tests. 
¥*p < 01. 


Study 4 


The data for this study were analyzed by 
computing the mean judged ethicality of each 
of the 24 incidents for each of the four groups. 
All hypotheses were supported. Male HSE 
business (7 = 13) rated the incidents as 
more ethical than male HSE nonbusiness 
(n = 10) (sign test, p < .01), whereas there 
was no difference between the male LSE 
business (x = 17) and nonbusiness (” = 13) 
groups (sign test, p= .27). Similarly, fe- 
male HSE business (7 = 5) were higher than 
nonbusiness (7= 8) (sign test, p< .02), 
whereas there were no differences for the 
female business (”=7) and _ nonbusiness 
(w= 9) (sign test, p= .50). 


DISCUSSION 


Taking in context the results of the four 
studies reported here and the results reported 
in previous research, (Korman, 1966, 1967a), 
there is a highly consistent trend of evidence 
which argues that people differing in self- 
esteem choose occupations differently. Bas- 
ically, the high self-esteem person seems to 
look at himself and say “I like what I see 
and I am going to give it its desires and 
needs,” whereas the low self-esteem person 
seems to say, when looking at himself “I do 
not like what I see and I am not going to 
give it its desires and needs.” While this may 
be a slight oversimplification, it seems to 
summarize, in essence, the kinds of results 
found over a wide variety of different instru- 
ments, different samples, and differing levels 
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of choice specificity. It is further strengthened 
by our related finding that even when the 
LSE individual is provided with fulfillment of 
his desires, it does not lead to satisfaction on 
his part (Korman, 1967b). (However, con- 
tinued fulfillment of his needs might lead to a 
reevaluation of self, and thus change his 
determinants of satisfaction. There is little 
research on this.) 

At least two further questions of interest 
occur here. The first is why this behavior 
occurs. A second is on what basis LSE people 
make vocational choices, if not on the basis 
of need-fulfillment. In terms of the first ques- 
tion, there are, of course, a variety of explana- 
tions ranging from childhood training pat- 
terns not to be contradictory in behavior to 
conceptions of a need for “social reality” com- 
parable to that of physical reality. For both 
of these cases, situations of inconsistency 
would then be anxiety-provoking and, hence, 
to be avoided. 

Turning to the second question, perhaps 
the LSE individual attempts to implement 
the value of an “ideal self” rather than an 
“actual self,” a possibility which would 
generally be consistent with the notion of the 
LSE person as an individual who dislikes 
himself. In addition, it may be that such 
“ideal self” fulfillment is more determinate 
of his job satisfaction than his ‘actual self” 
fulfillment. A second possibility is that his 
behavior may be, at least partially, a 
function of social norms, that is, he may 
choose and be satisfied according to per- 
ceived social norms as to what is desirable 
and what is undesirable. However, since these 
possible explanations are not inconsistent 
with one another, since the relationship be- 
tween self-esteem and persuasibility may be 
more complex than this (cf. Cox & Bauer, 
1964), and since little research is available on 
either of them, such conjectures at this time 
are speculative only. 


3T am indebted to Mr. Jeffrey Greenhaus for this 
suggestion. 
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PREDICTIVE POWER OVER TEN YEARS OF MEASURED 


SOCIAL SERVICE AND SCIENTIFIC INTERESTS 
AMONG COLLEGE WOMEN‘ 


LENORE W. HARMON 2 


University of Wisconsin—M ilwaukee 


The use of “usual occupation” instead of “current occupation” as a criterion 
was used to study the predictive validity of the Women’s SVIB. One hundred 
sixty-nine women who scored A or B+ on the Social Worker scale and a 
contrasting group of 125 who scored A or B+ on the Laboratory Technician 
scale in 1953-1955 were located in 1966-1967 and asked about their vocational 
history and current vocational commitment. Thirty-nine percent of the former 
group and 36% of the latter group reported “usual occupations” appropriate 
to their SVIB scores; however, 44% and 40%, respectively, reported no “usual 
occupation.” Among those reporting some career commitment, the predictive 
validity of the Women’s SVIB was essentially equal to the validity of the 
men’s form, but the SVIB was of no help in identifying which women would 


report career commitment. 


The predictive validity of the Strong Voca- 
tional Interest Blank (SVIB) for women 
has never been explored. Studies by Strong 
(1955) and Campbell (1966a) have estab- 
lished the predictive validity of the SVIB for 
men by using eventual occupations as the 
criterion. Virtually all men are employed so 
it is relatively simple to determine whether 
they are employed in occupations predicted by 
earlier SVIBs. 

It is not so easy to pick a criterion for the 
women’s SVIB. Women enter and leave the 
labor force intermittently over their lifetimes. 
Careers are interrupted, temporarily or per- 
manently, because of marriages and families. 
Only 35% of the married women in the 
United States are currently employed al- 
though the probability that a woman will be 
employed in a career some time after her 


1 This paper was presented at the April 1968 con- 
vention of the American Personnel and Guidance As- 
sociation in Detroit, Michigan. 

2 This study was begun at the University of Min- 
nesota and continued after the author’s departure. 
The Student Counseling Bureau, the Center for In- 
terest Measurement Research, and the Director of 
Student Life Studies, Dr. Ralph Berdie, at the Uni- 
versity of Minnesota generously continued to sup- 
port the research. Deanna Berkenpas and Iffat Shah, 
who are research assistants at the Center for 
Interest Measurement Research, were invaluable in 
locating Ss and collecting data. 

Requests for reprints should be sent to the author, 
219 Mitchell Hall, University of Wisconsin, Mil- 
waukee, Wisconsin 53201. 


marriage is increasing (United States De- 
partment of Labor, 1966). While vocational 
planning is more important than ever for 
women, ‘current occupation” is not an ap- 
propriate criterion against which to validate 
vocational planning tools such as the Women’s 
SVIB. 

This study assessed the predictive validity 
of the Women’s SVIB by the method used by 
Campbell (1966a) in which he followed up 
men who had all obtained high scores on one 
SVIB scale, but it differed from Campbell’s 
in that the criterion used was “usual” rather 
than “current” occupation. The women were 
asked to state their usual occupation, whether 
or not they were currently employed or, if em- 
ployed, whether or not they were currently 
employed in that field. Thus, for criterion 
purposes, the busy wife and mother who 
listed her usual occupation as “teacher” was 
classified as a teacher; the woman who was 
currently employed as her husband’s accoun- 
tant in a new business venture but listed her 
usual employment as “librarian” was classed 
as a librarian; the well trained nurse with 8 
yr. of experience who now devotes herself to 
her home and family and answered “none” to 
the question, “What is your usual employ- 
ment?” was classed as a housewife. 


METHOD 


The SVIB profiles of women entering the Uni- 
versity of Minnesota during 1953-1955 were in- 
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TABLE 1 


RATE OF RESPONSE OF THE SW anpd LT Groups 








SW group | LT group 


Item 








Completed questionnaires 


received 169 5 Sm meleaas Sl 
No response 76 25 75 oil 
Never located 64 20 43 18 
Total 309 | 100 | 242 | 100 


spected to locate those who had high scores (A or 
B+) on the Social Worker (SW) scale. This scale 
was chosen because Layton (1958) found that 25% 
of one class of University of Minnesota freshmen 
obtained high scores on it, and because social work 
is a field women can enter relatively easily. This 
choice insured both a reasonable number of cases 
to work with and a fair trial for the scale. 

For contrast, another group with high scores (A 
or B+) on the Laboratory Technician (LT) scale 
were also identified. This scale was selected because 
the Social Worker group scored lowest on it; the 
correlation between the two scales is —.65 (Camp- 
bell, 1966b). No Ss overlapped between the SW & 
LT groups. 

The SW group originally included 309 women, and 
the LT group 242. Locating the women was difficult 
because most of them had married and moved from 
the addresses at which they had lived as students. 
Questionnaires were systematically mailed to all the 
former residences listed in university records for 
each woman in the SW group in the spring of 1966 
and to women in the LT group in the spring of 
1967. After one follow-up letter, the returns were as 
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reported in Table 1. The Never Located group in- 
cludes women for whom no address at which mail 
was accepted was ever found. The No Response 
group includes women for whom mail was accepted 
at one of the addresses available; however, there was 
no assurance that mail was actually forwarded since 
there was no response from these addresses. These 
returns are lower than Campbell’s 77% in the study 
of men with high scores on Life Insurance Salesmen, 
but locating women is more difficult because they 
change their names. 


RESULTS 


The mean SVIB profiles for the SW and 
LT groups are in Figure 1. For the SW group, 
the Speech Therapist and Music Performer 
scales were nearly as high as the Social 
Worker scale, the Medical and Physical 
Science groups were rejected, and the pre- 
marital pattern of high scores on the House- 
wife, Office Worker, Steno-Secretary scales 
(Layton, 1958) was not evident. For the LT 
group, scores on the Physical Therapist and 
Nurse scales were actually higher than scores 
on the Laboratory Technician scale; all high 
scores were in Health Service and Medical 
Science occupations; the Music, Verbal-Lin- 
guistic, Social Service, and Sales occupations 
were rejected, and the premarital pattern was 
not evident in this group either. 

For each group the questionnaire responses 
regarding major in college, usual occupation, 
and current occupation, were classified in- 
tuitively into the following categories for the 
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Fic. 1. Mean SVIB profiles of the SW and LT groups. 


SoctAL SERVICE AND SCIENTIFIC INTERESTS AMONG WoMEN 


TABLE 2 


COLLEGE Majors OF THE SocrAL Work GROUP 








Career 
Major 
N q Cum % 
Social work 18 19 19 
Teaching 27 29 48 
Medical service 7 18 66 
Other social science 7 7 73 
Other-unrelated 16 17 90 
None 9 10 100 
Total 94 100 
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Homemaker Total 

N % Cum % N % Cum % 
12 16 16 30 18 18 
20 Dif 43 47 28 46 

2, iS 46 19 11 ai 

4 5 51 11 6 63 
13 7 68 29 17 80 
24 ays 100 33 20 100 
75 100 169 100 





SW group: (a) Social Work, (bd) Teaching, 
(c) Medical Service, (d) Other Social Service, 
(e) Unrelated Professional, and (f) Clerical 
and Skilled. For the LT group, the comparable 
categories were: (a) Medical Technology, (0) 
Other Medical, (c) Other Scientific and 
Mathematical, (d) Unrelated Professional, 
and (e) Clerical and Skilled. 

Categories a—d for the SW group included 
occupations which are compatible with high 
scores on the Social Worker scale. These 
scores reflect an interest in direct, helping re- 
lationships such as the practice of social 
work, teaching, nursing, and other occupations 
like speech therapy and clinical psychology. 

Similarly, Categories a—c for the LT group 
included occupations which are compatible 
with high scores on the Laboratory Tech- 
nician scale. These scores reflect scientific 
and mathematical interests and preferences 
for indirect helping relationships such as the 
practice of medical technology, X-ray tech- 
nology, and pathology. 


Tables 2 and 3 show the college majors for 
the SW and LT groups separated into Career 
and Homemaker subgroups. The Career sub- 
groups included those who listed a usual oc- 
cupation, even if they were not currently em- 
ployed, or were currently employed in some 
other job. The Homemaker subgroups _in- 
cluded those who listed no usual occupation 
even if they were currently employed. 

If one is willing to grant that a college 
major in any of the first four categories in 
Table 2 is consistent with a high Social 
Worker score, 63% of the SW group were 
enrolled in consistent college majors; the 
percentage is higher (73%) in the Career 
subgroup. Analogously, in Table 3, 56% of 
the LT group were enrolled in college majors 
consistent with their high scores on the LT 
scale, 63% of those in the Career subgroup. 

Tables 4 and 5 show the usual and current 
occupations of the SW and LT groups. Nine 
percent of each group was perfectly pre- 
dicted, that is, became either Social Workers 
or Laboratory Technicians. 


TABLE 3 


CoLLEGE MAjors oF THE LABORATORY TECHNICIAN GROUP 








Career 
Major 
N % Cum % 
Med tech 12 16 16 
Other medical 26 35 51 
Other science, math 9 LZ 63 
Other—unrelated 25 33 96 
None 3 4 100 





Homemaker Total 

N % Cum % N q% Cum % 

4 8 8 16 13 13 
11 22 30 37 30 43 

7 14 44 16 13 56 
5 30 74 40 32 88 
13 26 100 16 13 101 
50 100 125 100 


Total 75 100 
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TABLE 4 


OccuPATIONS OF THE SW Group 











Career 


Homemaker Total 











Job category Current job Usual job Current job Current job Usual job 
Cum Cum Cum Cum Cum 

N|% lo |X | %) | N | Mo) | No1°% | | XS eee 
Social work 6 6 Oi LGn els 17}; —| — 0 6 4 4 | 16 9 9 
Teaching ZAP 225) 28s ON eo 2m 49 1 1 ig eee 17 | 30 el Sanmeca 
Medical service 11) 12 AO) l/s Saal 67, 1 1 Dale 7\- 24 | An (Ones 
Other social service 2 2 | a2 |S 3 70; — | — 2 2 1 25 3 2 | 39 
Unrelated professional | 13 | 14 | 56] 16 | 17 | 87 2 3 Syl peOh P Sz eo) ale 9 | 48 
Clerical, skilled AO Grey Faby PGS AP aay 3 Sie it On e40n ete i eos 
Unemployed 32 | 34 | 100} — | — | —] 69 | 92 | 100} 101 | 60 | 100 | 75 | 44 | 99 

Total 94 94 75 169 169 





Note.—Percentages may total 99 or 101 due to rounding. 


When the expanded criterion of “usual oc- 
cupation” was used and all social service or 
scientific occupations were regarded as ap- 
propriate, the percentages increased to 39 
and 36; among those with career commit- 
ments, the percentages were yet higher, 70 
and 61. 

The sets of categories for the SW and LT 
groups overlap. For instance, medical occupa- 
tions appear in both because people can enter 
medical occupations out of social service in- 
terests, scientific interests, or some combina- 
tion of the two. 

Table 6 illustrates that there is a qualita- 
tive as well as a quantitive difference between 











the SW and LT groups in the medical oc- 
cupations which they list as their usual oc- 
cupations. All of the members of the SW 
group who listed medical occupations chose 
occupations involving direct patient contact 
in which human relationships are of primary 
importance. Over half the LT group who 
listed medical occupations chose occupations 
in which patient contact is minimal or in- 
direct, and scientific procedure is more im- 
portant than human relationship. Thus the 
overlap between the medical occupations ac- 
cepted as compatible with social service and 
scientific interests is not as great as it might 
seem; neither is it nonexistent. 


TABLE 5 
OccUPATIONS OF THE LT Group 














Career Homemaker Total 
Job category Current job Usual job Current job Current job Usual job 

Cum Cum Cum a7 cult Cum 

N | % |-oe| H) lo | W| ll oe | M1 ol | Nie 

Med tech 7 9 9} 11 | 15) 15}; — | — 0 14 6 oO} td] 9 
Other medical 11 | 15° | 24) 26 °)-35 | 507) —7 | OT) ead 9°) 15° | (26 \e2 eee 
Other scientific—math | 4 5 | 29} 8} 11} 61) — | — 0 AS | LS § | ou se 
Other professional 15>] 20 }- 49} 242) 327) 93 |= 5) | 10! 10)) <205}.5165))| 34 || 247 Sees 
Clerical, skilled 5 7 | 56] 6 8 | 101] 4 8} 18 9 7} 41 6" 70m Feo 
Unemployed 33 | 44 | 100| — 41 | 82100) 74] 59 | 100 | 50] 40 | 100 

Total 75 75 50 125 125 


Note.—Percentages may total 99 or 101 due to rounding, 


SocraAL SERVICE AND SCIENTIFIC INTERESTS AMONG WoMEN 


Tables 4 and 5 show that when “usual oc- 
cupation” is employed as a criterion, predic- 
tive accuracy is lowered because a large per- 
centage of women have no usual occupation. 
Forty-four percent of the SW group and 40% 
of the LT group were homemakers who 
claimed no usual occupation even though some 
of them were trained and had experience in 
professional level occupations. Actually 20 of 
the SW Homemaker group and 16 of the LT 
Homemaker group have been employed previ- 
ously in social service or scientific occupa- 
tions. However, they no longer expressed any 
commitment to these careers. Many of them 
had worked at various jobs unrelated to social 
service or science since that employment. 
Having once worked in a field is not an ap- 
propriate criterion of vocational interest, even 
though using it would mean adding 12 or 13% 
to the accuracy of prediction which could be 
claimed. 

If a way could be found to differentiate the 
women who would eventually develop career 
commitments from those who would retire 
permanently from employment, the predictive 
accuracy of specific scales could be increased. 
That is, if a counselor could identify which 
60% of the women who get high scores on 
the Laboratory Technician scale when they 
enter college would develop career commit- 
ments, he could predict that about 15% of 
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TABLE 6 
MeEpiIcaL OccupaTIONS® OF THE 
SW anp LT Groups 
SW group LT group 
Medical occupation % of % of 
N | total} WN | total 
group group 
Nurse 12 7 10 8 
Med technologist — — 11 9 
X-ray technologist ad a 4 3 
Physical therapist = = 4 3 
Occupational therapist + 2 4 3 
Pathologist (M.D.) -- — 2 2 
Veterinarian — — 1 1 
Medical records librarian | — — 1 1 
Dental hygienist 1 1 -— — 
Total 7 10 37 30 

















8 Usual occupations. 


them would choose Medical Technology and 
61% of them would choose scientific occupa- 
tions. These levels of accuracy approach the 
ones found for scales of the Men’s SVIB 
(Campbell, 1966a; Strong, 1955). 

Figure 2 shows that the mean SVIB pro- 
files for the Career and Homemaker groups 
were quite similar. The Housewife, Academic 
Achievement, and Masculinity-Femininity 
scales which might be expected to differentiate 
the subgroups did not do so. An earlier study 
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Fic. 2. Mean SVIB profiles of the SW career and homemaker groups. 
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(Harmon, 1967) suggested that the SVIB 
Housewife scale was not predictive of women’s 
career patterns. Comparable data for the LT 
subgroups also shows a striking similarity be- 
tween the Career and Homemaker groups. 
There is_no way of differentiating between 
the Career and Homemaker groups on the 
basis of SVIB profile. 


CONCLUSIONS 


“Usual occupation” is a fair criterion for 
studying the predictive validity of vocational 
interest tests for women. It avoids the prob- 
lems of using “current occupation” or “any 
experience in the occupation” as criteria. For 
women who are committed to careers, that is, 
those who claim some “usual occupation,” 
the Social Worker and Laboratory Technician 
scales of SVIBs taken 10-14 yr. earlier ac- 
curately predicted commitment to social ser- 
vice or science occupations 70 and 61% of 
the time, respectively. For the 40-45% of 
women who are homemakers with no career 
commitment, the Social Worker and Labora- 
tory Technician scales of SVIBs taken 10-14 
yr. earlier were, by definition, not predictive. 
Their SVIB profiles did not differ much from 
those of women with career commitments. 

It would be interesting to determine what 
kind of women develop career commitments 
and which do not. This should be done but, 
from a practical point of view, it is more im- 
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portant to determine the differences between 
occupational interests because the women who 
utilize testing to help make decisions are 
usually choosing among vocational plans. The 
woman who prepares for a career may decide 
to follow it or not. The woman who does not 
prepare has no choice. The counselor who 
uses the Women’s SVIB as though each high 
score was predictively accurate for all women 
rather than just for those who become career 
women will leave more doors open for future 
choices by his client. 
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COGNITIVE, NONCOGNITIVE, AND ENVIRONMENTAL 
CORRELATES OF MECHANICAL INGENUITY ' 


W. A. OWENS 2 


University of Georgia 


This is a 1964 follow-up study of 1500 engineering students originally ad- 
ministered four tests of mechanical creativity during 1955. Analyses were 
devoted to the criterion validity of the original battery, the job environment 
and life history correlates of outstanding performance, and the possibility of a 
Type of Person X Optimal Environment interaction. Results suggest (a) a 


validity coefficient of 40 for the tests of 1955; 


(b) factors of academic 


underachievement and research orientation of supervision as important personal 
and environmental correlates of performance; and (c) no Type of Person X 


Optimal Environment interaction. 


The construction of a special battery of 
tests designed to discriminate creative from 
noncreative or development engineers and 
the subsequent validation of these tests on an 
independent sample of 304 industrially em- 
ployed engineers was reported by Owens, 
Schumacher, and Clark in 1957. The final 
form of the battery which survived cross- 
validation consisted of the Personal Inventory 
(PI), a quasi forced-choice inventory dealing 
with interests, attitudes, opinions, personal 
characteristics, and experiences; the Personal 
History Form (PHF), a single sheet dealing 
with personal background; the Application 
of Mechanisms Test (AMT); solutions to the 
Power Source Apparatus Test judged to be 
workable (PSA-W), and total number of solu- 
tions to the PSA (PSA-T). Conclusive demon- 
stration of the predictive efficiency of the 
battery, however, required a longitudinal de- 
sign which would permit the accumulation 
of evidence of creative performance over a 
number of years. With this purpose in mind, 
the final battery was administered in 1955 
to over 1500 juniors and seniors in the me- 
chanically-related branches of engineering at 
25 colleges and universities. The security of 
the test scores was maintained while criterion 
information was accumulating and an account 
of the relationship between the predictions of 


1The author wishes to recognize the substantial 
contributions of Michael Brodie, Maureen Kallick, 
Stephen P. Klein, Richard Klimoski, Robert B. 
Means, and Mark Van Slyke. 

2 Requests for reprints should be sent to the au- 
thor, Department of Psychology, University of 
Georgia, Athens, Georgia 30601. 
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1955 and the actualities of 1964 follows. An 
attempt has also been made to recognize some 
of the moderating influences involved. 


PuRPOSES AND HYPOTHESES 


Specifically, attention has been directed to 
three questions: (a) What is the evidence 
regarding the predictive validity of the 1955 
measures of creativity in machine design? 
(6) What are the personal (noncognitive) 
and environmental characteristics which have 
facilitated or inhibited the expression of this 
creativity in the meantime? and (c) If we 
identify types of persons and types of environ- 
ments, do they interact; that is, is it true that 
one type of environment is optimal for per- 
sons of Type A and another for persons of 
Type B? Attendant upon, and congruent with, 
the preceding purposes, the following hy- 
potheses were formulated: (a) The present 
cognitive tests are better predictors of crea- 
tivity in machine design than a common 
mental ability (or scholastic aptitude) test; 
this is, at least in part, because the one places 
premium upon a different cognitive pattern 
or style than the other (Guilford, 1959). 
(6) The creative individual is cognitively 
complex and can integrate more inputs than 
his less creative fellows; thus, tests which re- 
strict or structure more, and which imply 
more inputs (PSA), will be superior to those 
which restrict or structure less (AMT). (c) 
Accepting the phenomenal nature and com- 
plex determination of creativity, prediction 
will be enhanced by appraising not only cog- 
nitive characteristics, but noncognitive and 
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Fic. 1. A comparison of the shapes of distributions on the PSA-W test for Groups 1, 2, and 3. 


environmental characteristics as well. (d) 
Since creative persons do not all appear to 
respond to the same contexts or to function 
in the same way, an interaction between per- 
sonal and environmental determiners of crea- 
tivity is postulated. Thus Environment X 
may be optimal for persons of Type A and 
Environment Y for persons of Type B. 


MeEtTHOopsS 


In overview the methods adapted to the above 
purposes are as follows: (a) Correlational analyses 
were utilized to answer Question 1 regarding the 
predictive validity of the 1955 battery. (6) To 
answer Question 2, inventories of personal (non- 
cognitive) and environmental characteristics were 
completed by Ss in 1964; the items of each were 
factored, and factor scores were then correlated 
with the creativity criterion. (c) Question 3 was 
attacked by successively subgrouping Ss; __ first 


on the basis of the profile similarities of their job 
environment factor scores, and then on the basis of 
the profile similarities of their personal character- 
istics factor scores. The two classes of subsets then 
became the two criteria of classification for an 
analysis of variance in which the interaction would 
represent Type of Person X Type of Environment. 


Subjects 


The potential pool of Ss for the follow-up of 
1964 consisted of 1537 students of 1955 who were, 
typically, originally tested as beginning upperclassmen 
in the mechanically-related branches of engineering. 
The vast majority were in civil, aeronautical, or 
mechanical curricula, and they were from all parts 
of the country and a wide variety of institutions. 
Alumni offices of the schools in question were able 
to supply 1268 addresses. Of these, 109 proved to 
be insufficient or incorrect and no forwarding ad- 
dress could be obtained. Thus, 1159 Ss were 
actually contacted and 938 (81%) ultimately re- 
plied. Since seven returns were unusable, 931 Ss 
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TABLE 1 


CRITERION ELEMENTS: THEIR MEANS, STANDARD DE- 
VIATIONS, AND RELATIVE IMPORTANCE 








WEIGHTS 
Group i 
LHQ item 
M | SD | Wt. 
104. a. Improving products or 
processes 5.28 | 11.44] 1 
104. b. Developing products or 
processes 3.57 | 9.29 | 1 
114. Papers presented at pro- 
fessional meetings 1.27 | 0.68] 2 
121. Papers published in pro- 
fessional journals OMIT 2:07 22 
126. In S’s own name: 
a. Patents held 0.06 | 0.32] 3 
b. Patents pending 0.21 | 0.81] 3 
c. Patent disclosures 0.42) 1.50} 4 
127. With contribution by S: 
a. Patents held 0.12} 0.68} 1 
b. Patents pending OI (O:62\\) 1 
c. Patent disclosures 0.14] 0.64] 2 











were involved in this study. These were assigned to 
one of the following three categories. 

Group 1 was composed of 457 engineers employed 
in research and development (R & D). Group 2 con- 
sisted of 104 former R & D engineers promoted into 
engineering management (EM). Group 3 was com- 
posed of 370 Ss in two loosely defined sub- 
groups: first, those not in engineering per se, but 
in an engineering-related occupation, such as teach- 
ing engineering or sales engineering; second, those in 
an unrelated area, such as medicine, law, or the 
ministry. 

Figure 1 essentially involves a comparison of the 
shapes of the test score distributions of the three 
groups of Ss on the single most valid measure. 
However, in speaking of the predictive purposes of 
the study, it must be clear that Ss, by groups, 
have not had an equal opportunity to accumulate 
criterion evidence of their creativity. Attention has, 
therefore, been centered on Group 1 (R&D). 
Measuring Instruments 

In the follow-up of 1964, two inventories and a 
covering letter were sent to each S. Many investi- 
gators have reported relevant and favorable experi- 
ence with the biographical information blank, or 
BIB, as a predictor (Owens & Henry, 1966). Ac- 
cordingly, the first inventory was a so-called Life 
History Questionnaire (LHQ) of 181 items dealing 
with Ss’ demographic characteristics and _ experi- 
ential background plus 10 criterion items (Klein, 
1965). Pursuant to a suggestion by Taylor (1964), 
the second was a Job Environment Survey (JES) 
of 80 items covering the “research climate” in which 
S worked (Kallick, 1964). 
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Criterion "ER 

The 10 items of criterion information collected 
appear in Table 1 along with their means, standard 
deviations, and relative importance weights. These 
weights were assigned by five members of the engi- 
neering faculty at Purdue all of whom had had 
both academic and industrial experience relevant to 
the design and development of products or processes. 
In the context of their importance to these activities, 
the weights were assigned to each criterion element 
by each judge in accordance with the Kelly “bids sys- 
tem” (see Toops, 1944). The criterion score of a 
given S, then, was his standard score on the given 
criterion element, multiplied by the appropriate im- 
portance weight and summed across the 10 elements. 


Statistical Treatment 


Treatment of the data in the context of the three 
purposes originally stated would probably be most 
intelligible. Thus the first purpose was to answer 
certain questions regarding the predictive validity of 
the creativity battery of 1955. As noted in the 
methods overview, these questions were answered 
in terms of correlational analyses of conventional 
character. 

The second purpose was to identify some of the 
personal (noncognitive) and environmental char- 
acteristics which have facilitated or inhibited the 
subsequent expression of the creativity measured by 
the battery of 1955. This purpose was served by 
factoring independently the items of the JES and 
the LHQ and by relating factor scores on the fac- 
tors obtained to the composite criterion of creativity. 
In the case of the JES most of the items are an- 
swered as applying or not applying (1 or 0) to the 
given S’s working situation. However, since some 
permit multiple responses, a total of 109 options 
were available for analysis. No continuum-type items 
were included, and 10 binary items were eliminated 
because they yielded response frequencies below 
10%, thereby enhancing the risk of obtaining diffi- 
culty factors. Thus, 99 item-options (hereafter 
“items”) ultimately entered the factor analysis, 
Twenty-one factors were initially extracted from 
the matrix according to the method of principal 
components. A decision to rotate four of these was 
based on a plot of the latent roots. Following an 
orthogonal rotation, all of the items which did not 
load at least .20 on one of the rotated factors were 
eliminated as were most items which loaded nearly 
equally across several factors. This procedure was 
then repeated until nearly half of the original items 
had been dropped. The technique was adopted for 
two reasons: (a) It was felt that it would sharpen 
interpretations regarding the differential criterion 
relevance of various environmental influence factors; 
(b) It was planned to fulfill the third purpose of 
this study by classifying Ss on their factor score 
profiles in a fashion requiring the independence of 
the basic dimensions. In anticipation of this next 
step, standardized factor scores were computed for 
each individual on each of the ultimate job environ- 
ment factors. 
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An identical procedure was utilized with the 
items of the LHQ, with the following two exceptions. 
(a) The LHQ contains both continuum-type and 
binary items; however, the latter have several 
options under the same stem from which only one 
is to be selected. The options are, thus, not invariably 
independent and only one from a common stem 
could be permitted to enter the factor analysis. 
(6) Since mixing binary and continuum-type items 
leads to ambiguities, it was decided to factor the 
binary items first and to include binary factor scores, 
along with scores on continuum-type items as entries 
in the matrix to be factored. When the final set of 
three personal history factors had been identified, 
standardized factor scores were computed for each 
S on each factor. It was then a simple matter to 
fulfill the second purpose regarding the relevance of 
personal and environmental factors to creativity by 
correlating scores on each of the two sets of factors 
with the composite criterion. 

Finally, the third purpose had reference to testing 
the possibility of a Type of Person X Type of 
Environment interaction. To test it required that Ss 
be subgrouped, successively, on each set of factors 
and assigned to appropriate cells in an analysis of 
variance design with two criteria of classification. 
On the job environment factors, for example, this 
was accomplished as follows. Each S’s factor scores 
were regarded as comprising a profile. The similarity 
of each profile to every other profile was expressed 
in terms of the D2? statistic of Cronbach and Gleser 
(1953). Subgrouping was then accomplished through 
application of the hierarchical procedure of Ward 
and Hook (1963) to the obtained matrix of D2 
values. The technique essentially compares each 
profile with every other, combines the most similar 
pair, treats them as a unit, and repeats the process. 
The criterion is the minimization of within-groups 
variance and the machine output indicates the in- 
crease in this variance which accompanies each 
reduction in the number of subgroups. Since the 
operation is sequential and the decision regarding 
the ultimate number of groups judgmental, no 
claim is made that the overall solution is optimal. 

For these reasons a special evaluation program 
was written (Brodie, 1966) to follow the Ward-Hook. 
Via the D? this program compares each S’s profile 
with the mean profile of each potential subgroup 
and permits several important outcomes: (a) The 
Ss may be reassigned to a new subgroup if this is 
indicated; (b) the Ss who equally resemble two or 
more subgroups within specified limits may be placed 
in a residual group; and (c) errors made at several 
grouping levels may be used as one objective criterion 
for determining the optimum number of groups. 
The evaluation program was applied to the present 
data and the results were used to serve the three 
purposes indicated. 

When Ss had been subgrouped in the indicated 
fashion on the basis of the job environment factors, 
the same procedure was repeated employing the life 
history factors. The former subgroups were then 
regarded as representing levels of a “job climate” 
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factor and the latter as levels of a type of person 
factor in an unequal cell-size, weighted means, 
analysis of variance design. Cell entries were com- 
posite criterion scores for a given type of S exposed 
to a given climate. An interaction between these 
two criteria of classification would then be identifi- 
able and would speak directly to the third purpose 
of the study (see Means, 1966). 


RESULTS 


In attempting to evaluate the predictive 
validity of the creativity battery of 1955, 
two preliminary issues arise. First, before 
examining the relationship of predictors to 
criteria within a selected group (R & D), it 
seems appropriate to inquire as to the dis- 
tributions of the three groups on the single 
best predictor, PSA-W. Since the distributions 
concerned are skewed, their means and vari- 
ances do not define them adequately. The 
complete scatter plots have been included as 
Figure 1. It should be noted that two different 
scales are employed on the y axis in order to 
make the areas under the curves more nearly 
comparable. Given this, there is an apparent 
tendency for Group 3 to show the lowest 
scores and Group 1 the highest, as expected. 
There may also be some tendency for Group 2 
to score more like 3 than 1, a result which 
might or might not have been anticipated. 

Second, before examining in greater detail 
the issue of how well the creativity battery 
predicts, it seems a prerequisite that it predict 
with some uniqueness. The question most 
commonly raised concerns the independence 
of what is measured by general mental ability 
tests from what is measured by creativity 
tests. In the present instance it was found 
possible to obtain scores on the American 
Council on Education Psychological Examina- 
tion (ACE) for 167 of the 457 R & D engi- 
neers. The test had been administered to 
them as college freshmen some 2 yr. prior to 
the time when they completed the creativity 
battery. The top row of Table 2 contains the 
correlations of the ACE with the cognitive 
predictors of the creativity battery and with 
the composite criterion. It will be noted that 
none of the former is of substantial magnitude 
and that the latter is near zero. On the other 
hand, those scores derived from the special 
battery are significantly, if modestly, cor- 
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related with the composite criterion. The re- 
sult should not be overinterpreted, but it does 
suggest reasonable independence of creativity 
from general mental ability within the present 
sample. 

To evaluate the predictive power of the 
creativity battery over time, reference is made 
to Column 5 of Table 2 which contains 
minimum estimates of the relationships in- 
volved. It will be observed that only the PSA 
test and the PHF are effective predictors. 
Both the AMT, which looked promising in 
concurrent validation, and the PI, which had 
appeared to be making a marginal contribu- 
tion, failed to correlate significantly with the 
criterion in the more demanding predictive 
context. At any event, the estimates of 
Column 5 can, realistically, be increased if 
consideration is given to three qualifying 
influences. 

First, unambiguous criterion data are avail- 
able only for the R & D group. However, if 
the battery were administered as an employ- 
ment test immediately following graduation, 
the range of scores would be wider because 
both Groups 1 (R & D) and 2 (EM) would 
be included. If it were administered as engi- 
neering students became upperclassmen, to 
serve as a sectioning device, the range would 
be still wider because Group 3 would also be 
included. Even in the first case, the zero-order 
criterion correlations of Table 2 would be 
.02 to .03 points higher in the more variable 
sample (see Thorndike, 1949, pp. 169-173). 


TABLE 2 


PREDICTIVE VALIDITIES AND INTERCORRELATIONS OF 
THE ACE AND SEVERAL SPECIALLY 
DEVISED PREDICTORS 








Predictor 2 3 4 5 





ACE (WN = 167) Oe |beo2. alee Uae |e 
Application of Mechanisms 

Test .14* | .40* | .06 
Power Source Apparatus 

(PSA-W) oS 8 WAS 
Power Source Apparatus 

(PSA-T) woe 
Composite Criterion 
Personal History Form aloe 

Note.—N = 457. 

*> < 01. 
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TABLE 3 


PornT-BISERIAL CORRELATIONS oF PSA-W Score 
WITH THE COMPOSITE CRITERION 


PSA-W raw | Centile level 


score of dichotomy Tpbie SE 
0 95 24 09 
12 90 34 07 
11 85 33 07 
10 80 .29 06 


Second, and for two reasons, the product- 
moment correlation provides a poor estimate 
of the critical predictor-criterion relationship 
for these data. This is because (a) the cri- 
terion distribution is extremely positively 
skewed with many values of zero or near zero 
and (6) it is not the task of the present 
battery to discriminate throughout the range 
of scores, but only at its upper end. A priori, 
it seems clear that whatever fraction of engi- 
neers is to be regarded as truly creative, it 
probably does not exceed 5-20%. Accord- 
ingly, a decision was made prior to analysis 
of the data to examine several biserial cuts at 
several test score levels versus the continuous 
criterion. Levels selected were the highest 5%, 
10%, 15%, and 20% with the middle two 
to be regarded as most critical. Scores selected 
were those on the single best predictor, the 
PSA-W test. Table 3 contains the results and 
shows the point-biserial correlation between 
test and criterion as a function of the level 
at which the former variable is dichotomized. 
It will be recalled that a correlation of .33 or 
34 is .08 to .09 points above the minimum 
estimate of Table 2. 

Third, the zero-order estimates of the table 
just cited are really inappropriate because it 
was originally proposed to construct a battery 
to be used as a battery, and not as a series of 
separate subtests. Accordingly, it may be 
noted that combining the PSA-W and PSA-T 
with the PHF yields a shrunken R of .28. 
This represents an increase of .03 points over 
the best zero-order validity coefficient of .25. 

If the effects of the three influences noted 
were assumed to be additive, they would 
argue for an estimated correlation of the order 
of magnitude of .38 to .39 between the com- 
bined predictors and the criterion, in Groups 


204 


TABLE 4 


INTERCORRELATIONS AND VALIDITIES OF 
THE LHQ Factors 














LHQ factor Y 3 4 
Socioeconomic background | —.02 | —.07 | —.09 
Favorable self-perception ad —.03 | —.13* 
Academic achievement — _ — .19** 


Composite criterion — — — 





Note.—N = 307. One hundred and fifty cases ‘‘held out”’ 
to cross validate multiple Rs. 


ED < 101. 


1 and 2 (combined), and with a selection 
ratio of 10-15%. To evaluate the reasonable- 
ness of this estimate a random sample of 100 
cases was drawn from the R & D group. The 
zero-order correlations were corrected for re- 
striction in range; the PSA scores and the 
PHF score were combined into a single variate 
through application of the “beta” weights 
derived from the R; and this combined pre- 
dictor distribution was dichotomized at the 
level of the top 10% of cases and the top 
15% of cases. The resulting validity esti- 
mates were .41 and .37, respectively, giving 
support to the approximate accuracy of the 
additive estimate. 

The second major purpose of this investiga- 
tion was to identify those personal non- 
cognitive characteristics (measured by the 
LHQ) and those environmental characteristics 
(measured by the JES) which have sig- 
nificantly facilitated or inhibited the expres- 
sion of creative potential by the present Ss 
between their initial testing in 1955 and the 
follow-up of 1964. Data pertinent to this 
purpose appear in Table 4 which contains the 
names assigned to the LHQ factors ultimately 
identified, their intercorrelations, and the cor- 
relation of each with the composite criterion. 
All the factor validities are low, but the 
identification of Factor 3 and its significant 
criterion correlation are worth a comment. 
The responses given by persons scoring high 
on this factor lead to the following character- 
ization of the respondent: member of an 
honor society, high ranking in class, a scholar- 
ship winner, and a joiner of professional or- 
ganizations. As indicated by the criterion cor- 
relation of —,19, the more creative Ss of this 
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study could be so characterized a little ess 
frequently than their fellows. It should, how- 
ever, be born in mind that virtually all Ss of 
Group 1 hold engineering degrees and are 
professionally employed. 

Corresponding data derived from analysis 
of the JES appear in Table 5. Once more all 
of the correlations are low, but the criterion 
validity of Factor 4 sets it apart as a variable 
of some interest and importance. The typical 
high scorer on this factor gave item re- 
sponses implying the following characteriza- 
tion of his job environment: The head of his 
department publishes, his colleagues hold ad- 
vanced degrees, the company provides after- 
hours laboratory facilities for personal re- 
search, the head of this department has con- 
tributed to patents pending or held, and the 
head of his department has an M.S. or PhD 
in engineering. Score on this factor correlates 
as highly with score on the composite criterion 
as the best cognitive predictor and suggests 
the appropriateness of some subsequent dis- 
cussion of its role and meaning. 

The third purpose of this study relates to 
the task of evaluating the possibility of a 
Type of Person < Type of Climate interaction. 
The procedures described earlier were first em- 
ployed to establish subsets of Ss with similar 
profiles on the four JES factors; four subsets 
were identified. The same methods were then 
utilized to identify seven subsets of Ss having 


TABLE 5 


INTERCORRELATIONS AND VALIDITIES OF 
THE JES Factors 











Factor 2 3 4 5 
Utilitarian  self-de- 
velopment — 12] —.01 03° |= .13* 
Supportive super- 
visory and peer 
relationships st (el Sn .08 
Perception of suc- 
cess — — .03 | —.01 
Professional and re- 
search _ orienta- 
tion of supervision} — — _— 26 
Composite criterion | — — — = 


Note.—N = 307. One hundred and fifty cases ‘‘held out” 
to cross valisate Rs, 
< .05; 


> < 01. 
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similar profiles across the three factors of the 
LHQ. The subsets then became levels of a 
job climate factor and of a type of person 
factor, respectively, in an analysis of variance 
design with two criteria of classification. 
Table 6 contains the summary of the indicated 
analysis and reveals that the main effects as- 
sociated with both sets of subgroups are sig- 
nificant, but that the desired interaction does 
not even approach significance. To test this 
result in yet another form, persons in the 
four quarters of the distribution of scores on 
the PSA-W were regarded as subsets, con- 
stituting four levels of a type of person factor 
based on creative potential. A second analysis 
of variance was based on this revised di- 
mension versus type of job climate. This sum- 
mary appears as Table 7. Once again, no 
significant interaction emerged, and it was 
therefore concluded that what constitutes the 
job environment most conducive to creativity 
may be generalized across all types of Ss 
or levels of potential represented in the pres- 
ent R & D group. 


DISCUSSION 


Before drawing any conclusions on the 
basis of the data presented, certain limita- 
tions in it and in the analytical methods em- 
ployed must be pointed out. 


Restriction in Range 


Prominent among the methodological prob- 
lems of this investigation is that of a con- 
spicuous restriction in the range of talent 
employed. Clearly, within the available sam- 
ple, only the R & D engineers of Group 1 
had had an equal and substantial opportunity 
to produce tangible criterion evidence of their 
creativity. Yet if training and employment 


TABLE 6 
LHQ versus JES Derinep SuBGROUPS SUMMARY 








Source SS df MS | F ratio 
JES subgroups 2,309.77 3 | 729.92 | 979* 
LHQ subgroups | 1,870.08 6 | 311.68 | 396* 
Interactions 1,359.38 18 75.52 96 
Error 30,210.96 | 384 78.67 





*p < 01. 
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TABLE 7 


PSA-W versus JES DrEFINep SuBGRouPS SUMMARY 











Source iS) df MS | Fratio 
JES subgroups 1,802.35 Se) OOOLTS: |ee7. 83% 
PSA-W subgroups| 1,217.53 3 | 405.84 | 5.29% 
Interactions 848.33 9 94.26 | 1.23 
Error 31,996.11 | 417 76.73 
#ip <= 7.01s 


have partially selected them for aptitude, they 
are restricted to an unknown extent on the 
PSA as compared with a population of all 
engineering graduates. If it is desired to 
draw even broader inferences regarding the 
relationship of intelligence to creativity in the 
general population, the data are obviously 
inadequate. For example, it may well be that 
there is some intellectual level beneath which 
creativity is seldom if ever seen. If so, the 
two variables are correlated to this extent. 
Given the present sample of graduates of a 
difficult and technical college curriculum, how- 
ever, it is difficult to conceive of any direct 
test of the hypothesis. 

Restriction in another sense exists because 
only one area of application was considered. 
No Type of Person X Type of Environment 
interaction was discovered. If one type of per- 
son is creative in music, another in art, an- 
other in literature, and still another in ma- 
chine design the finding may, clearly, be 
artifactual. That is, there may indeed be an 
interaction between type of creative person 
and type of optimum environment, but no 
evidence of it in this case because the area 
of machine design is occupied by only one 
type of creative person. In short, a limitation 
of this study is that it does not include the 
full spectrum of talent; conclusions must be 
qualified accordingly. 


Temporal Interval 


A second series of problems intrinsic to 
these data relates to the time elapsed between 
testing (1955) and the collection of criterion 
information (1964). If the interval had been 
shortened, criterion data would have been 
less complete and reliable. On the other hand, 
during a 9-yr period there have, no doubt, 
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been some true, intrinsic, and differential 
changes in the creative capabilities of Ss. 
These, of course, attenuate and lower pre- 
dictor-criterion relationships in a manner 
which is for many purposes unwarranted. A 
case in point may be a postulated differential 
effect by school attended which would have 
occurred subsequent to testing during the 
junior and senior years. Thus, if evidence of 
short-term validity is desired, the present data 
probably provide an underestimate of it. 


Measurement 


Another sort. of limitation of the present 
design is that it involves the mixing of con- 
current and predictive data. For example, the 
JES provides the former and the PSA the 
latter. To compare their validities implies that 
the procedures are equally exacting; whereas, 
it is well recognized that such is not the case. 

A second illustration of the same type in- 
volves the PHF and LHQ. The former in- 
volves only eight open-ended items and has 
been shown to have predictive validity. When 
the follow-up was undertaken it was felt that 
many potential noncognitive predictors of im- 
portance had been omitted and the LHQ was 
constructed accordingly. However, the po- 
tential validity of this latter device was 
surely underestimated, since proven and 
critical items. already included in the PHF 
were not reintroduced into the LHQ. 


Statistics 


Since the D® statistic was employed as a 
measure of profile similarity, it seems in order 
to point out one limitation of this index as 
used: It involves an unweighted summing 
of squared deviations across all dimensions 
of a profile. It was known that these dimen- 
sions had differing criterion validities and 
they might have been differentially effective 
as potential moderators as well. In any event, 
a given deviation entered the sum with the 
same weight regardless of its origin. The im- 
plication that each dimension is of the same 
intrinsic importance in the subgrouping is 
probably misleading. 


Criterion Heterogeneity 


Finally, it must be recognized that the Ss of 
this investigation were employed in a wide 
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variety of industries, producing many differ- 
ing products and employing a plethora of 
methods. It is a truism that the attempt to 
predict criteria embedded in .these widely 
divergent contexts is fraught with inaccuracy. 
How much better prediction might be within 
one company is a matter for conjecture, but 
the concurrent phase of the present study 
(1957) suggests that the increment would be 
very substantial. 


INTERPRETATION 


An hypothesis suggested by the writer in 
1957 is that it is necessary to structure highly 
or control the association process in the pres- 
ent area of utilitarian creativity in order to 
enhance or optimize the validity of measure- 
ment. For example, the AMT requires only 
that Ss name machines of any sort within 
which the given mechanism might function. 
On the other hand, the PSA test requires 
that one start with a prescribed power source 
and produce a prescribed motion sequence. 

The entire matter is probably better con- 
ceived in terms of current theories of cogni- 
tive style (Schroder, Driver, & Streufert, 
1967). Adopting this frame of reference, it 
seems clear that the individual who can ac- 
cept numerous situational restrictions and still 
be creative is simply able to integrate more 
inputs than the one who cannot. Cast in this 
form the discriminating dimension emerges as 
one of cognitive complexity. 

The present finding of little relationship 
between mental ability and creativity test re- 
sults is in accord with the findings of Getzels 
and Jackson (1962). However, the meth- 
odology in the two cases is so divergent as 
to argue for little real precedent. In the case 
of academic achievement versus creativity, on 
the other hand, there is at least an apparently 
sharp contradiction in outcomes. It is the 
impression of the writer that this is more 
apparent than real. Getzels and Jackson 
(1963) present data which indicate that some 
cognitive measures beyond the IQ might be 
useful in the prediction of academic success. 
However they present no evidence that their 
creativity tests actually measure any external, 
normative criterion of creativity of any char- 
acter. Thus, when they conclude that their 
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high creativity-lower IQ group does as well 
in secondary school as their high IQ-lower 
creativity group, they are, in fact, only com- 
paring selected groups which differ in. their 
performances on two types of cognitive tests. 
In the present study, on the other hand, score 
on a cluster of LHQ items implying academic 
underachievement has been correlated with 
an external criterion of creativity and found 
to be significantly associated with it. Here 
creativity is defined not in terms of test per- 
formance, but in terms of such external evi- 
dence as patents and patent disclosures. Un- 
doubtedly each type of finding has value, but 
they are contradictory in name only. 

A result of greater importance concerns the 
relationship of JES Factor 4, professional and 
research orientation of supervision, to the 
composite criterion. Going back at least as 
far as the work of Adamson (1952) on “func- 
tional fixedness,”’ there has been considerable, 
understandable ambivalence regarding the 
character of leadership most appropriate to 
an R & D operation and most facilitative of 
creativity in general. The conflict has had 
polar opinions ranging from “If we expect 
people to be creative we can’t tell them what 
to do,” on the one hand, to “If we market 
products within a restricted range, all new 
ideas do not have equal utility and some 
guidelines must be prescribed,” on the other. 
A resolution has often led to a quite permis- 
sive philosophy of leadership. The present 
data, in another vein, suggest the appropriate- 
ness of something quite different. As indicated 
by the high loading items of Table 5, the 
optimum environment for the present Ss was 
one in which they were led by example; 
the example not only of their head but of 
their colleagues’. It seems clearly implied that 
leadership from one who has done is not only 
tolerable, but that it probably constitutes a 
stimulus to do likewise. 

In a related way, it is clear that the treat- 
ment of Ss with the optimum environment is 
far from uniform and that this absence of 
uniformity tends to confound the relationship 
of predictors to criterion. Accordingly, adding 
JES Factor 4 to the multiple correlation of 
PSA-W and PHF with the composite criterion 
increases the relationship by .03 correlation 
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points. This operation is equivalent to asking, 
“Tf we knew in advance the sort of environ- 
ment in which our Ss would function, would 
a correction for its favorability or unfavor- 
ability not enhance prediction?” Both ra- 
tionally and empirically the answer is “Yes.” 

Finally, in speaking of the complexity of 
creativity, it may be of interest to observe 
the results of combining both predictive and 
concurrent validity cofficients as though they 
were comparable and of identifying the three 
best measures. This task was accomplished 
by introducing into a “tear-down’” multiple 
regression analysis all 10 potential predictors 
of the composite criterion of creativity (4 
JES factor scores, 3 LHQ factor scores, 2 
PSA scores, and score on the PHF). Predic- 
tors were then dropped out in inverse order 
of contribution until only 3 remained. Among 
these, the largest independent contribution 
was made by JES Factor 4, professional and 
research orientation of supervision, the second 
largest by PSA-T, and the third largest by 
LHQ Factor 2, favorable self-perception. It 
is at least intriguing to note that one is a 
measure of the environment, one a cognitive 
measure, and one a noncognitive measure. 
Complex determination of creativity seems 
clearly implied. 


CONCLUSIONS 


1) The PSA test was found to be a better 
predictor of the composite criterion of crea- 
tivity in machine design than a well-regarded 
mental ability or scholastic aptitude test. The 
first hypothesis may thus be regarded as 
supported. 

2) The PSA test was found to be a better 
predictor of the present criterion than the less 
structured AMT. The second hypothesis and 
the implications of this finding for the rele- 
vance of the cognitive complexity dimension, 
therefore, may be regarded as sustained. 

3) Since both noncognitive and job environ- 
ment measures were found to be significant 
predictors of the present creativity criterion 
and to enhance prediction when added to that 
of a cognitive measure, Hypothesis 3 also 
may be regarded as sustained. 

4) Since no significant interaction between 
personal and environmental determiners of 
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creativity could be detected, Hypothesis 4 
must be regarded as rejected in this context. 
5) With respect to the predictive validity 
of the battery of 1955, it is estimated that in 
a sample of graduates of schools of engineer- 
ing the best linear combination of measures 
should correlate .35 to .40 with a composite 
criterion of their subsequent creativity in 
machine design. A qualification is that the 
battery predicts better if scores are realist- 
ically dichotomized at a high level than if 
they are employed as a continuous variate. 
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SCHOOL PERFORMANCE 


PREDICTED FROM ASAT, INTELLECTIVE, 
AND NONINTELLECTIVE MEASURES* 


CLIFFORD E. LUNNEBORG? anp PATRICIA W. LUNNEBORG 


University of Washington 


Course grades and faculty ratings through fourth year architecture study were 
predicted for 228 students from four sets of variables: Architectural School 
Aptitude Test (ASAT) scores; ASAT scores complemented by 18 traditional 
academic predictors; the traditional battery alone; ASAT scores complemented 
by 16 biographic and interest items. ASAT scores alone predicted long-term 
criteria poorly, but complementing the ASAT with either academic or biographic 
variables produced the best predictions over all architecture criteria with 
shrunken validities from .43 to .58. Utility of predictors varied with criteria— 
faculty ratings were largely determined by traditional intellective measures 
while design performance was a function of nonintellective and background 
information which appears essential to prediction in areas of divergent thinking. 


This study is part of a continuing search 
for measures of divergent thinking and for 
better predictors of performance in occupa- 
tional areas depending on such ways of think- 
ing. The traditional predictors of college per- 
formance, that is, high school GPA and tests 
of verbal and quantitative aptitude, have 
always worked much better estimating suc- 
cess in English, mathematics, and biology 
courses than they have in art, music, and 
architecture. For this reason, the construction 
of the Architectural School Aptitude Test 
(ASAT; Educational Testing Service, 1965) 
centered around the predictive effectiveness 
of traditional measures versus tests designed 
specifically to tap abilities which architects 
had judged were related to success in archi- 
tecture school. 

The original validity study with the ASAT 
indicated that it did not outperform high 
school rank in class or GPA, but was a useful 
addition to high school record in predicting 
architecture performance (Pitcher, Olsen, & 
Solomon, 1962). Further, evidence was pre- 
sented that traditional verbal and mathematics 
scores in combination with high school record 
were inferior to the ASAT-high school record 
combination in predicting first year archi- 


1A modified version of this paper was presented 
at the American Psychological Association meeting, 
San Francisco, September 1968. 

2 Requests for reprints should be sent to Clifford 
E. Lunneborg, Director, Bureau of Testing, Uni- 
versity of Washington, Seattle, Washington 98105. 


tecture GPA. Even ignoring high school record 
the verbal and mathematics tests were not as 
predictive as the six ASAT subtests (ad- 
justing for shrinkage). 

The present study was prompted by two 
effects of the high rate of attrition among 
students in the validation study (only 24% 
or 145 students had completed their studies 
in 5 yr.). First, the small size of samples at 
the 12 participating schools made the re- 
sults somewhat inconclusive. Secondly, pre- 
dictors were consequently judged primarily in 
terms of first year architecture GPA; long- 
term criteria such as completion or non- 
completion for academic reasons were neces- 
sarily slighted. It was felt that additional 
evidence of validity for the ASAT, traditional 
and nontraditional (nonintellective) measures, 
was needed over a range of criteria of archi- 
tecture school success. 


METHOD 


Subjects. The total sample consisted of 228 stu- 
dents entering the University of Washington School 
of Architecture between 1964 and 1966. This group 
was predominantly male (92%), single (96%), and 
from Washington state high schools (85%). 

Predictors and criteria. The initial pool of predic- 
tors consisted of age and sex, and (a) ASAT total 
and six part scores (interest vocabulary, sensitivity to 
visual phenomena, science reasoning, intersections, 
complex space fitting, and incorporated lines), (b) 
six cumulative high school GPAs: English, foreign 
languages, mathematics, natural sciences, social stud- 
ies, and full-credit electives, (c) 10 tests: ACE 
Psychological Exam (Quantitative), Guilford-Zim- 
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TABLE 1 


CORRECTED MULTIPLE CORRELATION COEFFICIENTS FOR BEST SETS OF PREDICTORS OF 
SEVEN CRITERIA OF ARCHITECTURE SCHOOL SUCCESS 














Criteria 
Predictor set First | Second | Third | Fourth | Arch | All-uni- | Average 
year arch| year arch | year arch} year arch} design versity | faculty 
GPA GPA GPA GPA GPA GPA rating 
ASAT total and six part scores 35 (4) 38 (3) 14(1) —06(1) 20 (1) 29 (2) 43 (2) 
18 traditional predictors 37(5) 33 (3) 56 (6) 40(5) 33 (3) 48 (4) 52 (6) 
ASAT, 18 traditional predictors 46 (10) 43 (4) 58 (6) 48(7) 43 (3) 48 (4) 55 (8) 
ASAT, 16 nonintellective variables | 48(8) 48 (7) 44 (5) 46 (5) 52 (9) 38 (7) 54(7) 
N 226 201 147 78 124 228 166 
a Reece Rhee of variables in best set follows Re in parentheses. Decimal points have been omitted. 
merman Survey—Part I (verbal comprehension), RESULTS AND DISCUSSION 


CEEB intermediate mathematics, Washington Pre- 
College (WPC) tests of English usage, spelling, 
reading speed, reading comprehension, mechanical 
reasoning, spatial ability, and applied mathematics, 
and (d) 50 biographic and interest variables derived 
from admission applications or from a question- 
naire administered in introductory architectural 
design. 

There were seven criteria: first year architecture 
GPA (5 quarter hr. of introductory architecture 
and 9 hr. of drawing), second year architecture 
GPA (18 hr. of architectural design and 6 hr. of 
water color), third year architecture GPA (18 hr. 
of architectural design and 24 hr. of technical 
architecture), fourth year architecture GPA (18 hr. 
of design and 29 hr. of technical architecture), 
architecture design GPA (design beyond second 
year), cumulative all-university GPA, and _ the 
average rating (5-point scale) by three architecture 
professors of student potential based on personal 
interviews in the second year of architecture. 

Procedure. Intercorrelations among the 75 predic- 
tors and seven criteria were the basis for narrowing 
down the number of variables for four sequential 
predictor selection analyses: ASAT total and part 
scores; age, sex, high school GPAs and test scores; 
ASAT with these 18 traditional predictors; ASAT 
with 16 of the original 50 nonintellective measures. 
In each of the sequential predictor selections (Horst 
& Smith, 1950) variables were added to the predictor 
set as long as their contribution to prediction out- 
weighed the expected shrinkage in multiple cor- 
relation owing to increased number of predictors. 
No limit was placed on the potential number of 
predictors to be chosen so that as many useful 
variables would be identified as possible. Because of 
the fluctuation inherent in multiple correlations from 
one group to another, especially if groups are small, 
multiple correlation coefficients reported here have 
been corrected (Ro), that is, reduced to reflect the 
expected between-sample shrinkage owing to sample 
size and number of predictors. 


The mean ASAT total score for the entire 
group was 567; SD = 101. The average stu- 
dent entered architecture approximately 1 yr. 
after graduating from high school and 24% 
were enrolled in some other college prior to 
entering the university. 

The simple correlation coefficients for ASAT 
total score with all criteria compared closely 
to the multiple R,’s when ASAT total and. 
part scores were reweighted to provide the 
best prediction. It thus appears that the 
original weighting devised for the ASAT is 
broadly applicable. The criteria based on 
third and fourth year work as well as the 
design GPA, however, had validities of only 
.18 and below with ASAT total score. In the 
prediction of faculty ratings slightly better 
predictions were obtained by increasing the 
weight given two of the six parts, interest 
vocabulary and science reasoning. R, values 
for the reweighted ASAT parts are given in 
Table 1. 

As can be seen from Table 1, for all criteria 
except first and second year architecture 
work, the traditional battery (age, sex, high 
school GPAs, and 10 tests) provided sub- 
stantially better predictions than ASAT 
scores. However, ASAT scores complemented 
either with the traditional battery or with 
the biographic and interest variables per- 
formed better than the traditional battery 
alone for all criteria save all-university GPA 
where prediction from social studies and 
natural science GPAs and English usage and 








PREDICTION OF ARCHITECTURE SCHOOL PERFORMANCE al) 
TABLE 2 
STANDARD PARTIAL REGRESSION WEIGHTS FOR BEST SETS OF PREDICTORS OF 
SEVEN CRITERIA OF ARCHITECTURE SCHOOL SUCCESS 
First Second Third Fourth Arch All-uni- | Average 
Predictors year arch | year arch | year arch| year arch| design versity | faculty 
GPA GPA GPA GPA GPA GPA rating 
ASAT predictors 
ASAT total score 32 (1) 42 (1) 
ASAT Part I (interest vocab) 14(2) 36(1) 
ASAT Part II (sensitivity to phen) 09 (3) 13 (2) 
ASAT Part III (science reasoning) —13(2) 24(1) 15(2) 
ASAT Part IV (intersections) 17(1) | —13(1) 24(1) 
ASAT Part VI (complex space 
fitting) —07(3) 
ASAT Part VII (incorporated lines) | —06(4) 
R, 35 38 14 —06 20 29 43 
18 traditional predictors 
Sex (male) —11(5) 
Age 16(3) 
HS English GPA 44 (1) 
HS mathematics GPA 36(3) 
HS social studies GPA 20 (2) 47(1) | —46(4) 37 (1) 26(2) 
HS natural science GPA 3(1) 17(1) 30(2) 
HS electives GPA —13(5) —10(4) 
Guilford-Zimmerman verbal comp —33(3) —20(2) 
CEEB intermediate mathematics 25(2) | —32(2) 
WPC English usage test 13 (4) 28 (1) 
WPC spelling test 12 (3) 20 (4) 
WPC reading comprehension test 16(2) 15 (6) 
WPC mechanical reasoning test 36(1) —17(5) 16(3) 
WPC spatial ability test —14(4) 11 (6) 
WPC applied mathematics test 14(3) 13 (3) —10(5) 
¢ 37 33 56 40 33 48 52 
ASAT plus 18 traditional pre- 
dictors 
ASAT total score 34 (1) 43 (1) 
ASAT part I (interest vocab) 21 (6) 24(1) 
ASAT part II (sensitivity to phen) —13(7) 
ASAT part III (science reasoning) —20(3) 19 (4) 40 (6) 
ASAT part IV (intersections) 30(2) 16(3) 16(4) 
ASAT part VII (incorporated lines) | —09(10) 
Sex (male) —14(7) —11(6) 
Age 15 (2) , 
HS English GPA 15(3) 39 (1) 
HS mathematics GPA 42 (3) 
HS social studies GPA 50(1) | —44(4) 43 (1) 23 (2) 
HS natural science GPA 21(2) 22 (1) 29 (2) 
HS electives GPA —12(6) —11(5) 
Guilford-Zimmerman verbal comp —33 (5) —35(3) | —23(7) | —26(3) 
CEEB intermediate mathematics 24(2) | —41(2) 
WPC English usage test 12(9) 11 (4) 17 (3) 
WPC spelling test 19(5) 
WPC reading comprehension test 08 (4) 09(8) 
WPC mechanical reasoning test 15 (8) —35(5) 
WPC spatial ability test —18(4) 
Ro 46 43 58 48 43 48 55 
ASAT plus nonintellective 
variables 
ASAT total score 34(1) 40(1) 
ASAT Part I (interest vocab) 34(1) 
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Table 2— (Continued) 











First Second Third Fourth Arch All-uni- | Average 
Predictors year arch | year arch| year arch| year arch| design | versity | faculty 

GPA GPA GPA GPA GPA GPA rating 
ASAT Part II (sensitivity to phen) 13 (6) 14(5) 
ASAT Part III (science reasoning) —16(5) 22(1) 14(6) 
ASAT Part IV (intersections) 27 (2) 
ASAT Part VII (incorporated lines) | —09(6) 17(5) 
Father’s occupational level (Roe) —22(2) —26(2) —14(4) 
Father college graduate 45 (1) 
Mother employed outside home —17(3) 12(5) 
Mother college graduate —32(3) | —21(5) 
Firstborn (including onlies) —11(5) 
Interval HS to entrance in arch 13(8) 
Attended HS in state —09(4) | —11(4) 14(5) | —15(9) | —17(2) | —19(2) 
HS honor recipient 27 (1) 18 (4) 16(3) 
Part-time job in college 29 (3) 16(8) 18(3) | —10(7) 
Architecture HS vocational choice 14(7) 26(2) 31(1) 
Father’s occupation business contact —20(4) | —28(2) | —21(6) | —12(6) 
Father’s occupation technical 18 (3) —20(7) 
Creative people cited in art, arch 17 (3) 15(4) 
Service motivation for architecture 11(7) 09(7) 15 (4) 

R, 48 48 44 46 52 38 54 




















Note.—Order of selection in parentheses following weights with decimal points omitted. Predictor intercorrelations based on 
228 Ss administered ASAT of whom 166 had Washington Pre-College (WPC) scores and high school (HS) grades, and 186 had 
biographic data. Table includes only predictors selected at least once. 


mechanical knowledge tests could not be im- 
proved upon. The ASAT together with the 
traditional battery provided the best predic- 
tions of faculty rating, third, and fourth year 
grades, while biographic and interest items 
combined with the ASAT provided the highest 
multiple correlations with first and second 
year grades as well as advanced design. Table 
2 reports order of selection and standard 
partial regression weights for the variables in 
each predictor selection. 

Briefly, the biographic correlates of archi- 
tecture performance based on the predictor 
selections involving ASAT and nonintellective 
variables include the following. Roe’s (1956) 
occupational level of father was often selected 
and indicates that the higher family socio- 
economic status, the better will be student 
performance in architecture. Similarly, for 
fourth year grades, father’s education was 
the most potent predictor of all. A very good 
addition to prediction was having attended 
secondary school out of the state, and perhaps 
this variable too reflects socio-economic status 
through capacity to pay nonresident tuition 
and campus living costs. Performance in 
architecture was aided by having received 


honors in high school, by deciding in high 
school on a vocation in architecture, and 
curiously, by holding a part-time job in col- 
lege. Choosing architecture from a social ser- 
vice motivation especially contributed to 
faculty opinion of student potential. The last 
nonintellective variable of consequence was 
that of father’s occupation in business con- 
tact and selling (Roe, 1956) which adversely 
affected several criteria. 

A first conclusion from examining Table 1 
is that given the uneven predictability of 
criteria within a single school of architecture, 
probably any school wishing to use the ASAT 
must conduct its own validation study, select- 
ing and weighting variables which reflect the 
emphases in its particular curriculum. The 
relative importance to success of design 
courses, technical courses, and courses re- 
quired in areas outside architecture, such as 
physics and social science, will determine the 
kinds of predictors that. get selected. 

To illustrate this point from the present 
study, faculty ratings of student potential 
were best estimated from the interest vo- 
cabulary and science reasoning parts of the 
ASAT, high school natural science GPA, and 
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WPC English usage. Remembering that verbal 
and mathematics tests were excluded from 
the final ASAT battery on the grounds that 
they overlapped with interest vocabulary and 
science reasoning (Pitcher et al., 1962), 
faculty ratings would appear solely a function 
of traditional, intellective predictors. Ad- 
vanced design course performance, on the 
other hand, emphasized in its prediction 
one of the performance subtests of the ASAT, 
intersections, and a number of biographic 
and interest variables: early interest in archi- 
tecture, receipt of honors in high school, father 
employed in something other than selling or a 
technical occupation, mother not employed 
outside the home. All-university grade aver- 
age, depending in part on nonarchitecture 
course work required for graduation, was 
best predicted by the traditional “classic” 
battery of measures of academic aptitude and 
achievement. 

Although choice of criterion influenced the 
effectiveness of all predictors including the 
ASAT and its parts, this study provides ad- 
ditional evidence of the usefulness of the 
ASAT as a tool for guiding or advising pro- 
spective architecture students. It appears, 
however, that the effectiveness of the ASAT 
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would be considerably reduced were it not 
supplemented with other intellective measures 
or with biographic data, A cautious general- 
ization is that where criteria are short-term, 
augmentation with traditional predictors 
works well, but where criteria approach the 
ultimate in terms of architecture success, 
nonintellective background and interest vari- 
ables account for significant variance in ad- 
dition to the ASAT. For some time all archi- 
tectural criteria should be considered equally 
important. At this stage of exploring divergent 
thinking and its occupational counterparts, 
it is as critical to know how an individual will 
fare in his first year of study as it is to know 
whether he succeeds professionally some years 
hence. 


REFERENCES 


EpUCATIONAL TrsTING Service. Architectural School 
Aptitude Test: A guide to interpretation of scores. 
Princeton: Educational Testing Service, 1965. 

Horst, P., & SmitrH, S. The discrimination of two 
racial samples. Psychometrika, 1950, 15, 271-289. 

Pitcuer, B., Orsen, M., AND SoLomon, R. A study 
of the prediction of academic success in archi- 
tectural school. Princeton: Educational Testing 
Service, 1962. 

Ror, A. The psychology of occupations. New York: 
Wiley, 1956. 

(Received April 15, 1968) 


Journal of Applied Psychology 
1969, Vol. 53, No. 3, 214-217 


RELATIONSHIP BETWEEN INTERVIEW STRUCTURE AND 
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Eighteen experienced job interviewers were assigned randomly to three groups 
which differed in the degree of interview structure. The structured group was not 
permitted to deviate from a predetermined interview format. The same format 
was used by the semistructured group and deviations were permitted. The 
unstructured group was free to interview applicants in any manner. A job 
specification for the job of clerk-stenographer was developed. On the basis of 
this specification, descriptions of five hypothetical job applicants were con- 
structed. A separate description was given to each of five female undergrad- 
uates who were instructed in their job applicant roles. Each group interviewed 
and then ranked the five job applicants. The amount of interinterviewer agree- 
ment within groups was found to be positively related to the degree of inter- 


view structure. 


Three major reviews of the selection inter- 
view literature have concluded that structured 
interviews are probably superior to less struc- 
tured ones (Mayfield, 1964; Ulrich and 
Trumbo, 1965; Wagner, 1949). Mayfield 
(1964) found that published interview stud- 
ies reporting the greatest interrater agreement 
tended to employ structured interview for- 
mats. Ulrich and Trumbo (1965) and Wagner 
(1949) concluded that the greatest interview 
reliability and validity resulted when struc- 
tured interviews were used. These general 
findings, however, are not without exception. 
For example, Hakel (1966) employed a highly 
structured interview design in an employment 
situation, yet concluded that interrater “re- 
liability correlations . . . showed no more 
agreement than did Scott’s interviewers in 
1915 [p. 45].” 

Furthermore, any of these general con- 
clusions are of dubious value because they 
have been based on comparisons between 
studies which leave uncontrolled a number of 
variables known to influence interinterviewer 
reliability. Specifically, studies have shown 
that interviewers’ decisions are related to the 
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homogeneity between applicants, that is, the 
variance of total qualifiications in the ap- 
plicant groups presented for appraisal (Carl- 
son and Mayfield, 1967; Sydiaha, 1958), 
order of applicant presentation (Rowe, 1967), 
and the type of rating form employed to 
record interviewer judgments (Carlson and 
Mayfield, 1967). Other variables suspected of 
influencing interinterviewer reliability in- 
clude length of interview and type of job 
(Mayfield, 1964), and interviewers’ experience 
and knowledge of the job (McMurry, 1947). 
These and probably other uncontrolled vari- 
ables introduce an indeterminate amount of 
error into cross-study comparisons of inter- 
view reliability. 

Despite the obvious need to investigate the 
impact of interview structure under more con- 
trolled conditions, Mayfield (1964) stated 
“in no study located was the amount of struc- 
ture varied systematically to see what effect 
this would have on the results [p. 242].” The 
present study represents a preliminary attempt 
to close this gap by investigating the effect 
of three degrees of interview structure on 
interinterviewer reliability. Through the re- 
search design an effort was made to control 
for differences in rating forms, informational 
cues other than those called for in the degree 
of structuralization, order of applicant pre- 
sentation, homogeneity of applicants’ char- 
acteristics, interviewers’ knowledge of the 
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job, and interviewing experience. The research 
hypothesis stated that interinterviewer reli- 
ability is positively related to the degree of 
interview structure. 


METHOD 
Subjects 


Eighteen male assistant city department managers 
participating in a management development program 
served as Ss for the study. The Ss, who interviewed 
employment applicants as a normal part -of their 
job duties, were assigned randomly to one of three 
interview groups: structured, semistructured, and un- 
structured (n=6 per group). Comparison of the 
three groups on length of service in present job 
(M = 6.6 yr.) and number of employment interviews 
conducted (M = 22.3 per yr.) showed no significant 
differences. 

Five volunteer female undergraduates served as 
applicants for the job of clerk-stenographer em- 
ployed by the city government. Each of the girls 
had previous experience as an interviewee. 


Procedure 


The job of clerk-stenographer was chosen for the 
study because Ss were generally familiar with that 
position. A detailed clerk-stenographer job specifica- 
tion was developed describing the nature of the 
work, essential knowledge and abilities, and neces- 
sary training and work experience. This specification 
closely paralleled the actual clerk-stenographer job 
specification used by the city. A job application 
form also was prepared which provided for in- 
formation on biographical data, education, work 
experience, tested abilities for the job, outside in- 
terests and activities, and stated interest in the 
job. 

A hypothetical description, which included a com- 
pleted application blank for each girl, was developed 
for each of the five participating applicants. All 
applicants were described as meeting the minimum 
qualifications for the position required by the job 
specification, though their qualifications varied above 
the minimum. In addition, the hypothetical descrip- 
tions varied in terms of age, marital status, place 
of residence, outside interests and activities, and 
interest in the job. 

An effort was made to develop low homogeneity 
among applicants in order to insure a fairly high 
degree of agreement among interviewers in the struc- 
tured group. A pretest was conducted by giving 
judges not associated with the final study the five 
written applications, with instructions to rank them 
from best to poorest. Because these original de- 
scriptions elicited low agreement between judges, 
they were rewritten to increase the heterogeneity of 
the information contained on the application blanks. 
In most cases, this involved an alteration of the 
applicants’ tested abilities for the job, though these 
never were lowered below the necessary hiring 
minimum. 
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After revision, a hypothetical description was 
given to each applicant for memorization. In addition 
to the information on the application blank, each 
girl was given other information which dealt with 
home and family life, future plans, and more de- 
tailed work experience data to complete their 
hypothetical descriptions. The applicants were in- 
formed about the interviewing procedures they would 
experience and were told of the importance of pro- 
viding consistent answers to similar questions be- 
tween interviewing groups. 

All Ss were given copies of the detailed clerk- 
stenographer job specification and told that they 
would interview five applicants for that position. 
Although Ss did not know the purpose of the study, 
they were told that the applicants were in reality 
college students and that the information they ob- 
tained from the applicants was written specifically 
for this study. In addition, Ss were told to assume 
that the applicants had passed the necessary tests 
and interviews in the centralized personnel office 
and had been referred to Ss for a hiring interview. 
This procedure was identical to the selection pro- 
cedure used in the city government. Finally, Ss were 
told that each interview group had a maximum of 
10 min. to interview each applicant, that after com- 
pleting all of the interviews they should rank the 
five girls as potential clerk-stenographers from best 
(1) to poorest (5), and that they should not dis- 
cuss the applicants between interviews or during the 
ranking process. 

Each group then was assigned a separate room 
and given further specific instructions. (a) Struc- 
tured group: The Ss were given blank copies of 
the prepared application form and instructed to ask 
each applicant factual questions pertaining only 
to the information called for on the application form. 
(b) Semistructured group: The Ss were also given 
blank copies of the same form and were told to ob- 
tain all information called for on the application 
form. In addition, this group was told that they 
could ask each applicant any additional or follow-up 
questions which appeared appropriate. (c) Unstruc- 
tured group: The Ss were not given application 
forms. Rather, they were told that they could ask 
the applicants any questions during the time al- 
lotted for each interview. 

A time schedule was drawn up for the interviews, 
and the applicants were assigned randomly to the 
three interview groups and the five time periods. 
Kendall’s coefficient of concordance was used to 
determine the amount of agreement between Ss’ 
rankings within each of the three groups (Siegel, 
1956). 


RESULTS 


The results are presented in Tables 1 and 
2. Table 1 shows interviewer rankings of the 
applicants within each group. It is apparent 
that the greatest degree of interranker agree- 
ment exists in the structured group. Only 
Applicant C obtained as consistent rankings 
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TABLE tf 


APPLICANT RANKINGS BY INTERVIEWERS WITHIN EAcH GROUP 








Interviewers 





Job applicants Structured group (n = 6) 


Semistructured group (n = 6) 


Unstructured group (” = 6) 
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in the semistructured or unstructured groups. group. If there is interaction between ap- 


A statistical summary of the data is presented 
in Table 2. As hypothesized, the amount of 
agreement between interviewers increased as 
the degree of interview structure increased. 
While to our knowledge it is not possible to 
test the significance of the difference between 
groups for the coefficient of concordance, the 
differences in the magnitude of the coefficients 
are in the hypothesized direction. 


DISCUSSION 


The findings obtained in the present pilot 
investigation are important because they rep- 
resent the first known effort to investigate 
systematically the impact of the degree of 
interview structure on interinterviewer reli- 
ability. There are, however, several factors in 
addition to the small sample size which should 
be considered before generalizing from the 
results obtained here. 

First, attention must be directed to the fact 
that applicant qualifications were deliberately 
manipulated to ensure a moderate degree of 
interinterviewer agreement in the structured 


TABLE 2 


KENDALL’S COEFFICIENT OF CONCORDANCE 
FOR EACH INTERVIEW GROUP 








Secured Semi- Un- 
Item eon structured | structured 
group 
group group 
Coefficient of 
concordance Oar .43* 36 
*p < .05. 
> < 01. 


plicant homogeneity and interview structure, 
the results obtained could be varied solely by 
changing the homogeneity between applicants. 
Specifically, if little difference exists between 
applicants in terms of job qualifications, low 
interranker agreement might well be obtained 
in all groups. Alternatively, applicants might 
be so heterogeneous that interinterviewer 
agreement in the structured interview might 
be close to 1.0. There thus would be little 
opportunity for higher reliabilities to be ob- 
tained in the less structured interview groups. 

An examination of the interinterviewer 
agreement obtained (Table 2) suggests that 
neither extreme occurred. While interviewer 
agreement in the structured group was fairly 
high, it was not perfect. Thus, the possibility 
existed for greater interinterview agreement 
in the less structured groups. The authors 
deliberately sought to enhance this possibility 
by withholding some questions from the struc- 
tured and semistructured format which elicited 
relevant information about the applicants. In 
fact, greatest interinterviewer agreement oc- 
curred in the structured group. It thus ap- 
pears that when there is little homogeneity 
between applicants, the degree of interview 
structure has an impact on the reliability of 
interview rankings. 

Major. needs for future research on the 
determinants of interinterviewer reliability are 
suggested by this finding. Subsequent research 
on interview structure should vary the degree 
of homogeneity between applicants. Con- 
versely, investigators concerned with the im- 
pact of varying degrees of applicant homo- 
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geneity should investigate its possible inter- 
active effects with interview structure. 

Finally, it should be remembered that the 
interviewers ranked the applicants and that 
ties were not permitted. There is evidence 
suggesting that ratings tend to result in a 
higher degree of interinterviewer reliability 
than rankings because each applicant is rated 
independently and because ties are possible 
(Carlson and Mayfield, 1967). The amount 
of agreement within each group thus may be 
understated, relative to the amount of agree- 
ment possible with the use of ratings. The 
actual comparison of the agreement between 
rankings and ratings over varying degrees of 
interview structure remains to be tested. 

In summary, on the basis of the sample 
and technique employed, it appears that the 
degree of interview structure may have a sig- 
nificant impact upon the degree of inter- 
interviewer agreement. The limited number of 
judges and applicants suggests the need for 
replication of this finding. Furthermore, ad- 
ditional research is required to determine 
possible interaction between degrees of ap- 
plicant homogeneity and interview structure. 
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The performance of Ss trained in a visual monitoring task with an auto 


instructional device was compared with that of Ss trained by practice alone, 
The experimental group had three 50-min, training sessions on a device which 
included the standard monitoring task, but allowed S to select his signal 
schedule and call for immediate knowledge of results, or signal cueing 
(prompting), or both, and to test himself with no training aids available, 
Subsequent testing on the standard task revealed that Ss trained with auto- 
instruction showed a much higher detection rate (p< O01) than the control 
group, with no increase in commissive errors, Reasons for the success of auto- 


instruction in vigilance training are discussed, 


Vigilance, or monitoring displays for diffi- 
cult-to-detect, low-probability signals, has been 
studied extensively since World War II, but 
only recently have investigators turned to train- 
ing techniques. Mere practice at a vigilance 
task results in no more than slight improve- 
ment in detections (Wiener, 1968), but several 
Es have been successful in obtaining positive 
transfer effects when Ss were first trained 
with immediate knowledge of results (KR) 
and then transferred to a condition where they 
were deprived of this feedback information 
(Adams & Humes, 1963; Hardesty, Trumbo, 
& Bevan, 1963; Mackworth, 1964; Wiener, 
1963, 1967). Recently another training aid, 
known variously as prompting, cueing, or 
alerting, whereby the operator is warned that 
a signal is soon to appear, has been em- 
ployed, but with mixed success (Annett & 
Paterson, 1967; Colquhoun, 1966). The use 
of KR and cueing as training aids for per- 
ceptual tasks has been reviewed by Aiken and 
Lau (1967). Wiener and Attwood (1968) 
used combinations of the two in a factorial 
experiment and found KR highly beneficial to 
transfer of training, but cueing ineffective. 

Recent research in autoinstructional tech- 
niques outside of the monitoring field, employ- 
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ing devices ranging from. simple teaching 
machines to elaborate computer-based sys- 
tems, suggests some possibilities for training 
monitors, While the results of the transfer 
of training experiments previously cited have 
been promising from a practical point of 
view, highly regimented training programs 
may be less effective than autoinstructional 
techniques in which the trainee would have 
some options. By implementing a flexible 
program with options on the training aids, S’s 
performance may be enhanced for one or 
more of the following reasons: (a) S may 
actually know, or think that he knows, the 
best way to train himself; (4) S’s motivation 
may be heightened when he is an “active 
partner” in his own training, rather than a 
passive recipient of a rigid training regime; 
(c) autoinstruction allows elements of self- 
pacing and self-testing. 

There is currently no experimental evidence 
on the use of self-training devices for monitor- 
ing tasks. The closest to it are the experi- 
ments of Swets, Millman, Fletcher, and Green 
(1962), Swets, Harris, McElroy, and Rudloe 
(1966), and Weisz and McElroy (1964), who 
used computer-aided techniques, including 
KR, prompting, and others, in the identifica- 
tion of multidimensional aural and visual pat- 
terns. They found that Ss trained in accord- 
ance with the usually accepted principles of 
autoinstruction did not show significantly 
better results than those given simple stimulus 
presentations. Also, in the second Swets et al, 
study (1966), Ss had a variety of options 
available, but their performance was no 
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better than that of Ss in the previous experi- 
ment (1962), where they had received a 
fixed regime. 

In this study an experimental group per- 
formed, on five successive days, a pretest, 
three sessions of training on the autoinstruc- 
tional device, and a posttest. A control group 
performed all five sessions under “standard 
conditions” of visual monitoring, which were 
identical to the pretest and posttest of the 
experimental group. The autoinstructional 
device allowed Ss in the experimental group 
options over their use of KR, cueing, the 
signal rate and distribution, and provided the 
option for self-test periods without training 
aids, 


METHOD 
Subjects 


The Ss were 34 male undergraduates from the 
University of Miami, with no previous experience in 
the monitoring task. They were recruited by means 
of posters displayed on campus and were paid $10 
upon completion of the 5-day experiment. 


General Apparatus Features 


The monitoring task for both groups was the 
detection of an abnormally large deflection of a 
voltmeter needle. The meter faceplate was painted 
flat black and the needle was white. The nonsignal 
stimulus was a 20° rightward needle deflection, 
while the signal was a 30° deflection. Both the 
signal and nonsignal stimuli were the result of 
electrical discharges applied to the voltmeter through 
a resistor-capacitor circuit, triggered from a coded 
punch paper tape. In both the standard vigilance 
program and the training program, tape readers 
produced 50 stimuli/min. 


Standard Apparatus 


Room 1 was partitioned into three booths, in 
which Ss viewed the voltmeter via closed-circuit 
television. To achieve auditory isolation Ss wore 
earphones which played white noise. The Ss in- 
dicated the presence of a signal by pressing a 
silent pushbutton switch, mounted on the table in 
front of them. Responses were recorded on an 
Esterline-Angus event recorder, and also on two 
banks of reset event counters that recorded the num- 
ber of missed signals and sorted S’s responses into 
detections and false alarms. 


Training Group Apparatus 


The apparatus in Room 2 consisted of a display 
board, a selection panel, a silent pushbutton re- 
sponse switch, and earphones. The display board was 
a vertical black panel mounted 30 in. in front of S. 
It contained the recessed meter display, a set of 
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TABLE 1 


INTERSIGNAL INTERVALS (IN SECONDS) FOR 
SIGNAL RATE OPTIONS 





Signal rate 





Signal distribution 


Slow Medium Fast 
Fixed 36 24 12 
Random 72% 30# — 





« Average intersignal interval. 


three color-coded KR lights, information lights for 
the rest and selection periods, and four one-plane 
digital readouts which provided summary feedback 
information to S. The selection panel contained 12 
momentary-contact switches mounted on a plastic, 
translucent faceplate. These illuminating switches 
were arranged into five vertical columns which cor- 
responded to the five choices in the TRAIN mode. 
The monitoring task per se was the same as the 
standard task. The selection panel operated a relay 
logic bank which gated the correct signal and cueing 
information from punched paper tape to the display. 
Performance data were collected as in the standard 
task. The option choices were recorded on 15 pens 
of the Esterline-Angus recorder. 

The earphones were connected through an intercom 
system to three separate audio sources. A 400 cps 
recorded tone was used as a background noise when 
the branching mode was activated, and a 1000 cps 
tone provided the alerting signal during the cueing 
option. At all other times, except during the rest 
period, white noise was used for auditory isolation; 
during the rest period the earphones were silent. 


Signal Schedules 


The standard monitoring task provided 32 signals 
per session, spaced in a different random order in 


each of the five daily 48-min. sessions. The 32 
signals were equally divided into four 12-min.. 
periods within each session, with the minimum 


intersignal interval being 18 sec. In the training 
sessions, three signal rates and two signal distribu- 
tions were available as options to S. The signal 
frequencies are listed in Table 1. The minimum inter- 
signal interval was 9.6 sec. The intersignal intervals 
for the random distributions were determined from 
a table of uniformly distributed random numbers. 
The RANDOM SLOW signal rate provided either 
4, 5, or 6 signals per 6-min. period. 


Procedure 


The Ss were randomly assigned to two equal 
groups, Control (C) and Training (T). On the 
first day all Ss monitored for a 48-min. session on 
the standard apparatus. The Ss were read standard 
instructions, including a demonstration of five signals 
spaced 10 nonsignals apart. 
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Fic. 1. Percentage of signals detected as a func- 
tion of length of watch by training and control 
groups on pretest and posttest days. 


Since Group T monitored on both the standard 
and training equipment, they were told that the 
change of apparatus was to determine how well 
they performed on Day 5 after receiving three days 
of training. To cancel any bias due to these in- 
structions, Ss in Group C were also told that the 
results of the Friday session would determine the 
effects of practice, and therefore these posttest re- 
sults were what counted. For the next three days 
Group C returned to the same task. On the last 
day, Ss in both groups monitored the standard task 
for 48 min. 


Autoinstruction 


On the second day each S in Group T monitored 
in Room 2. Recorded instructions explained the use 
of the equipment. For each 6-min. period, S could 
choose either the TRAIN or TEST mode. He was 
allowed a 1-min. rest between periods and 20 sec. 
to choose the next program. The TRAIN mode 
allowed S free choice at all times of five signal 
rates, KR, and cueing, as well as a choice of 
whether to receive the corrective branching option. 
Choice of the KR option presented S with a green 
detection light as long as the response switch was 
depressed within 2.5 sec. following a signal. When the 
response switch was pressed outside of this 2.5 sec. 
interval, a red false alarm light was illuminated. 
If no response was made within the 2.5-sec. interval 
following a signal, the amber missed-signal light was 
illuminated for 2 sec. By choosing the CUE option, 
S was presented with a 1000 cps. tone 2.4 sec. in 
duration beginning 3.4 sec. before the onset of each 
signal. These aids were essentially the same as those 
employed in a previous study (Wiener & Attwood, 
1968). 

The branching option, which was called the Forced 
Signal Mode (FSM), was included to provide an 
automatic corrective learning sequence for S in the 
event of a missed signal. It forced S to receive four 
signals at the FIXED FAST signal rate, accom- 
panied by KR and cueing. At the end of the timed 
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interval, the program automatically returned to the 
options which S had chosen prior to the FSM. The 
S could suppress the FSM at any time during the 
6-min. TRAIN period except when FSM was ac- 
tivated. 

The TEST mode presented the RANDOM SLOW 
signal rate for the entire 6-min. period with neither 
KR nor cueing available. At the end of the TEST 
period, summary knowledge of performance (detec- 
tions, false alarms, and missed signals), was presented 
to S by means of the digital readouts. On the second 
day S monitored through three 6-min. periods to 
become acquainted with the equipment and task. 
On the third and fourth days, seven 6-min. periods 
were run. At the beginning of the third and fourth 
daily sessions, and at the end of the fourth session, 
each S was given summary knowledge of his cumula- 
tive performance in the TEST mode on mimeo- 
graphed sheets. 


RESULTS 


The primary results consisted of the num- 
ber of signal detections and the number of 
commissive errors (false alarms) produced by 
each S on the pretest and posttest days. The 
length of time spent in the various options by 
each S in Group T during the training days 
was also recorded. 


Detections 


Figure 1 shows the performance of Groups 
C and T on the pretest and posttest days 
with percentage of detections plotted against 
the four successive 12-min. time periods. A 
partially hierarchal analysis of variance was 
performed separately on the pretest and post- 
test detection data. The Ss were nested 
within groups, but common to the four time 
periods. The main effect of groups (G; train- 
ing vs. control) and the Groups X Periods in- 
teraction were not statistically significant at 
the .05 level, but the periods main effect was 
(F = 3.48; df = 3, 96). In the posttest analy- 
sis, the results indicated that the group ef- 
fect was significant (F = 8.40; df=1, 32; 
p < .01), while neither the Group X Periods 
interaction nor the periods main effect was 
significant. 

In order to test the relative gain of the two 
groups from the first to the final day, a simi- 
lar analysis was performed on the total num- 
ber of signals detected in each session 
(summed across periods) for each S on Days 
1 and 5. The Ss’ totals were nested within 
training conditions but common to the two 
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days, the pretest and posttest days being 
treated as time periods. The results of this 
analysis indicate that the days effect (D) 
was significant (F = 40.17; df = 1, 32; p< 
001), as well as the Groups X Days (G x D) 
mreraction (F = 10.55; df = .1,32;,p<..01). 
To test whether the increase shown by Group 
C was significant, a two-way analysis of vari- 
ance was performed on their session totals, 
with the pretest and posttest days treated as 
time periods. A significant between-days ef- 
fect (p < .05) was found. 


Commissive Errors 


The number of commissive errors in the 
pretest and posttest days for both groups is 
shown in Table 2. Poock and Wiener (1966) 
have pointed out that due to an extreme 
skewness in the distribution of commissive 
errors in this type of experiment, tests of 
goodness-of-fit always result in a significant 
deviation from the hypothetical uniform dis- 
tribution. The median test diminishes skew- 
ness effects by counting the number of Ss in 
each experimental group contributing more 
or less than the median number of commis- 
sive errors of the total sample. The con- 
tingency table thus formed can be analyzed 
using the chi-square statistic. The median 
number of commissive errors made in Days 1 
and 5 was zero. In this extreme case, the 
contingency table was formed by counting 
the number of Ss on each day making zero 
commissive errors, and those making one or 
more commissive errors. The test yielded a 
chi-square of 4.78, with df = 1. Even though 
the power of this test is reduced by the fact 
that Ss were common to Days 1 and 5, this 
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TABLE 2 
CommissiIvE Errors ror Days 1 anp 
5 BY GROUPS 
Group Day 1 Day 5 Total 
Training 33 5 38 
Control 43 19 62 
Total 76 24 100 





value still indicates a significant (p < .05) 
decrease in the number of commissive errors 
from pretest to posttest. The median number 
of commissive errors on Day 5 was zero. A 
median test for groups on Day 5 was formed 
by again counting the number of Ss in each 
group making zero commissive errors and 
those making one or more commissive errors. 
This yielded a chi-square less than unity, 
indicating no significant between-group dif- 
ference in the contribution of commissive er- 
rors on Day 5. 


Analysis of Training Options 


Table 3 summarizes the percentage of time 
spent in the training options by each S as well 
as the percentage of signals received by each 
S, accompanied with KR or cueing. The 
percentage of signals include those received 
in the FSM. There was no significant cor- 
relation between individual option times and 
either the posttest data or the increase in 
performance from pretest to posttest. Table 
3 shows that KR was called for 91.4% of the 
total TRAIN mode time, but cueing was rela- 
tively unpopular (35.5%). CUE, FIXED 
MEDIUM, and FIXED SLOW options 
dropped off sharply after the first training, 


TABLE 3 


PERCENTAGE TIME IN TRAINING OPTIONS AND PERCENTAGE SIGNALS WITH KR 
or CUEING DURING Days 3 AND 4 








Percentage time 


Percentage signals 


Training | 
day | Fixed Fixed Fixed | Random | Random re fia With With 
fast medium slow medium slow Being Kr cueing 
2 23.5 24.2 7.9 21.5 22.9 82.5 40.3 84.8 54.2 
3 23.6 13.3 yee 19.2 36.2 93.0 32.0 93.8 43.4 
4 27.0 9.7 8.4 23.1 31.8 95.0 36.5 94.0 47.4 
Total 25.1 | 11.5 8.1 21.1 34.2 93.9 34.1 93.9 45.3 
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TABLE 4 


Test MopE CHOICE AND PERFORMANCE DATA 
OVER THE THREE TRAINING Days 











Average 
Train- | Percentage as Percent- | commis- 
ing periods in of sig- age de- | sive errors 
i nals pre- ‘ 
day test mode d tections per test 
sente period 
2 Sore 87 86.2 0.555 
3 47.0 272 93.3 0.196 
4 54.6 328 94.2 0.062 





session. Tables 3 and 4 point out the popu- 
larity of the two extreme signal rates (RAN- 
DOM SLOW and FIXED FAST options). 
The TEST period data are presented in 
Table 4. All attempts to correlate the per- 
formance in the TEST periods with the post- 
test performance or with S performance in- 
crease failed. The results from Table 4 show 
a steady increase in detection rates from 
Day 2 through Day 4, accompanied by an 
increase in the number of TEST periods 
chosen. At the same time, the average com- 
missive error rate declined. 


DISCUSSION 


The results of this experiment show strong 
promise for the use of an autoinstructional ap- 
proach to vigilance training. Superior signal 
detection performance of Group T over Group 
C in the final session, with no increase in 
commissive errors, indicates that an improve- 
ment in detection performance is not neces- 
sarily accompanied by an increase in the 
number of false alarms. 

The significant (9.1%) increase in detec- 
tions of Group C, from pretest to posttest, is 
the product of a steady increase over the five 
practice sessions. This result, as well as the 
“end-spurt” in the final 12-min. periods, is 
not uncommon in monitoring studies. Wiener 
(1968) found an increase of approximately 
25% from pretest to posttest in a similar 
5-day experiment. It is assumed that one 
reason for the improved performance of 
Group C was the added incentive introduced 
by informing Ss that their scores on the final 
day were most important to E. This argu- 
ment might also be applied in some portion 
to Group T. 
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In the posttest, Ss in Group T showed very 
little variation in the percentage of detected 
signals over the four time periods, and varied 
only slightly from each other. This was due 
primarily to a “ceiling effect,” the detection 
rates being so close to 100%. The homoge- 
neous performance trend of Group T in the 
posttest explains the lack of correlation be- 
tween use of options, which were highly 
variable, with the relatively nonvarying final 
performance data. 

From the data presented in Table 3, one 
of the most noticeable features is the time 
spent in the KR option. The frequent choice 
of KR clearly points out that Ss sought 
knowledge of performance. In the NO-KR 
option, Ss’ behavior was observed to be gen- 
erally erratic, and Ss tended to play with the 
other options. 

This experiment provides no evidence to 
support the training value of cueing. The Ss’ 
choice of CUE was generally limited to the 
first half of a 6-min. period. The time spent 
in the CUE option generally dropped as the 
training progressed. This decrease in the use 
of cueing is possibly due to Ss becoming 
antagonistic toward the physical character- 
istics of the alerting tone. Another explana- 
tion is that once S learns (or relearns) the 
characteristics of the signal, he might desire 
more “challenge” from the task than the cue- 
ing option can provide. Wiener and Attwood 
(1968) have mentioned that Ss frequently 
volunteered the complaint that cueing denies 
them any challenge. The constant suppression 
of the FSM feature, and the apparent effort 
of Ss to evade the FSM when it was not sup- 
pressed, indicates that the-importance of this 
corrective branching option lies not in what 
it teaches, but in its perception as a punish- 
ment rather than a training aid. 

In conclusion, the autoinstructional ap- 
proach to vigilance training results in in- 
creased signal detections without contributing 
to increased false responses. The erratic use 
of cueing during the training sessions sug- 
gests the reason for the failure of other ex- 
periments to produce consistently superior 
performance when the cueing techniques were 
rigidly applied. Annett and Paterson’s (1966) 
conclusion, that the characteristics of the 
specific task should control the training pro- 
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cedure, is strengthened by this experiment. 
The suggestion of Swets et al. (1966) that S 
control his own training program also appears 
sound. 

The increased performance attributed to 
this training method is apparently due to two 
factors. The first concerns simply the in- 
creased signal information content provided 
to S. The second stems from the positive ef- 
fects that the training regime have on Ss 
when an interesting and _ self-participating 
training sequence is interjected between two 
monotonous vigilance sessions. The Ss readily 
volunteered the opinion that the training op- 
tions made the task interesting and “fun.” 
We have previously noted the “pin-ball ma- 
chine effect,’ the self-motivating properties 
of devices which allow man to pit himself 
against a machine, with no opportunity for 
reward other than self-satisfaction and KR 
(Wiener, 1967). These factors should be care- 
cully considered in future experimentation 
and practical use of autoinstructional trainers 
for perceptual tasks. Systems designers and 
training psychologists should be most eager 
to capitalize on what the trainee considers fun. 
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KNOWLEDGE OF PERFORMANCE AS AN INCENTIVE 


IN REPETITIVE INDUSTRIAL WORK 


P. S. HUNDAL 1 
Panjab University, Chandigarh-14 


This study was designed to assess the purely motivational effects of knowledge 
of performance in a repetitive industrial task. The Ss were low paid workers 
with a few years (1-5 yr.) experience on the job. The experimental task was 
to grind a metallic piece to a specified size and shape. Experimental conditions 
were imposed a week before starting the experiment. The workers adjusted 
readily since the experimental conditions did not interfere with the work. 
Eighteen male workers were divided randomly into three groups. The Ss in 
Group A received no information about their output; Ss in Group B were 
allowed a rough estimate of their output; Ss in Group C were given accurate 
information about their output and could check it further by referring to a 
figure displayed before them. Results show increased output with increaeses 


in degree of knowledge of performance. 


A review of studies of the effects of knowl- 
edge of results on performance by Ammons 
(1954) suggests that knowledge of results 
(KR), universally, tends to improve the 
performance of Ss in laboratory situations. It 
is difficult, however, to know from most stud- 
ies whether the improvement is due to mo- 
tivational effects of knowledge of results or 
to some side effect such as “information” or 
“reward” which has not been controlled sys- 
tematically. 

Gibbs and Brown (1955), for the first time, 
tried to isolate and measure the motivational 
aspects of KR by designing an experiment so 
that KR was more casual and accidental than 
is usually the case. Under these conditions, 
they argued, the increase in output, if any, 
can be attributed to purely motivational im- 
pact of KR. In their study they found sig- 
nificant improvements in performance of S$ 
as a function of KR. 

In a replication of the above study, with 
groups having different degrees of KR, 
Chapanis (1964) failed to confirm the Gibbs 
and Brown findings. Quite recently Locke 
and Bryan (1966, 1967) have reported that 
there is no effect of knowledge of scores on 
performance. Their studies show that perform- 
ance increases when Ss adopt goal setting 
procedures; that is, high goals lead to higher 
levels of performance than low goals. 


1 Requests for reprints should be sent to the au- 
thor, Department of Psychology, Panjab University, 
Chandigarh-14, India. 
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Further, very little is known about the 
purely motivational impact of KR on the 
performance of industrial workers in their 
actual work situation. 

The present study is a step in this direc- 
tion. The experiment was arranged on the 
lines of Chapanis (1964). Industrial workers 
acted as Ss and their performance on the job 
was manipulated in such a way that it was 
possible to control differential, casually given 
KR. It was hoped that under such conditions 
each S’s output would be determined entirely 
by self-competition and his own satisfaction 
in working. 


MeEtTHOD 
Subjects 


The Ss were 18 male industrial workers employed 
in a small industrial unit. They had been with 
this factory for 1 to 5 yr. Their age ranged from 
24 to 37 yr. They belonged to the lower income 
group, their salary ranging from 150 rupees to 175 
rupees/mo, Their work was supervised by an as- 
sistant manager, a partner in the concern. The Ss’ 
task was to “grind” a metallic piece to a specified 
size and shape. Finished pieces were placed in a 
nearby box. 


Procedures 


The 18 Ss were assigned randomly to one of 
three groups, 6 Ss to a group. 

Group A. The Ss in this group were required to 
keep their finished pieces in boxes fitted with flaps 
so they could not see how many they had done. 
Their boxes were emptied during the lunch break 
and at the end of the workday. 
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TABLE 1 


AVERAGE OUTPUT AND THE RANK OF THE PREEXPERIMENTAL 
AND EXPERIMENTAL PERIODS 





Preexperimental period 


Experimental period 


Group A Group B Group C 
Output | Rank | Output} Rank | Output | Rank 
38 6.0 41 17.0 40 13.0 
40 13.0 39 9.0 34 1.0 
38 6.0 41 17.0 40 13.0 
36 3.0 38 6.0 39 9.0 
39 9.0 35 2.0 37 4.0 
41 17.0 40 13.0 40 13.0 








Group A Group B Group C 
Output | Rank | Output | Rank | Output | Rank 
40 oS 38 DAS 48 18.0 
35 1.0 40 350 40 5.5 
38 DES 47 17.0 45 15.0 
43 10.5 44 13.0 43 10.5 
44 13.0 40 S20 46 16.0 
41 8.0 42 9.0 44 13.0 








Note.— Mean output, for Preexperimental period, Group A, is 38.7; Group B, 39.0; Group C, 38.3. Mean output for Experi- 


mental period, Group A, is 40.2; Group B, 41.8; Group C, 44.3 


Group B. Storage boxes for these Ss had no flaps. 
The Ss could see their mounting piles of finished 
pieces. The periodic cleaning of their boxes was 
always partial; thus, there always was a pile of 
finished pieces in their boxes at the beginning of 
each work session. The workers could, therefore, 
make a rough estimate of their output. 

Group C. The Ss in this group could also see 
their processed pieces in their respective boxes. After 
each work session their boxes were completely 
emptied. They could, therefore, make better estimates 
of their total output than workers in either of the 
other two groups. Moreover, they could also see a 
number, displayed on a card beside the box, that 
was an index of the amount of work completed 
during the preceding session. The Ss were not, how- 
ever, told what the number was. They could either 
draw an inference from it or ignore it. 

All the persons worked under the same roof. 
The Ss in different groups were put together by 
shifting their work seats. The experiment started 1 
wk. after they had worked in their new places 
under usual factory conditions. This week, the 
preexperimental period, was used to gather produc- 
tion records before undertaking the KR manipula- 
tion. A fixed quantity of raw material was supplied 
to all Ss at regular intervals. By keeping them 
constantly overloaded with raw material, they could 
not estimate how much work they had done by 
referring to the stock of raw material. 

The average output of all three groups was 
comparable before undertaking the experiment. 

The experimental manipulation began on May 22, 
1967, and ended on the 27th of the same month. 
The entire arrangement was done by the supervisor. 
The E did not come in direct contact with S; he was, 
however, available for consultation. 

The hypothesis under study was that the output 
of these Ss should increase under the impact of 
subtle variations in knowledge of performance. 


RESULTS 


The basic data gathered are the average 
number of pieces processed by each S in the 
preexperimental and the experimental periods. 
The significance of differences in output be- 
tween the three groups at both stages was 
checked by Kruskal-Wallis’ one-way analysis 
of variance (Siegel, 1956). 

The average output for each S and for 
each group is reported in Table 1 separately 
for the preexperimental and experimental 
periods. All Ss under these two conditions 
were ranked separately on the basis of their 
output. The rank values, given in Table 1, 
were then used to compute H values as de- 
tailed by Siegel (1956, Formula 8.1). The H 
values for the preexperimental and experi- 
mental groups worked to be .28 and 4.17, 
respectively. The latter value is significant at 
the .07 level (one-tailed test). 

Second, mean output of each of the experi- 
mental groups was compared with that of its 
counterpart in the preexperimental period. 
The main objective was to examine how group 
performance during the experimental period 
compared with earlier performance during the 
preexperimental period. Significance of dif- 
ferences was estimated by means of the Wil- 
coxon matched-pair sign rank test (Siegel, 
1956, pp. 77-79). Three comparisons were 
made, one each for Groups A, B, and C. 
Differences for Groups A and B were not sig- 
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nificant. The difference for Group C was sig- 
nificant at the .025 level (one-tailed test). 


DiIscuUSssION 


A look at Table 1 reveals that the mean 
output of the groups increases in the experi- 
mental period in direct relation to their de- 
gree of awareness of their performance. The 
difference approaches statistical significance 
(p < .07, one-tailed test). 

The second comparison involving intra- 
group differences between preexperimental 
and experimental periods shows the greatest 
(and statistically significant) increases in the 
output of Group C. The corresponding in- 
creases in Group A and B are not significant. 
Both approaches to the analysis of results 
agree in showing that output tends to increase 
under the presumed impact of the purely 
motivational aspects of KR. 


P. S. Hunpat 
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WORK SHIFT, OCCUPATIONAL STATUS, AND THE 
PERCEPTION OF JOB PRESTIGE * 


RONALD H. BOHR? anp ARNOLD B. SWERTLOFF 2 
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Ratings of job prestige provided by 170 respondents in two widely different 
work settings, a state mental hospital and an electronics plant, revealed a sig- 
nificant interaction between work shift and occupational status level. As 
hypothesized, nonsupervisory night workers (hospital attendants and assembly 
line workers) attributed less prestige to own and co-workers’ jobs than did 
their counterparts on the day shift, whereas no differences were found between 
the ratings of day and night supervisory personnel (registered nurses, industrial 
engineers, and foremen). These results suggest that situational characteristics 
of the work environment are more potent determinants of the job attitudes 
of lower-echelon than of upper-echelon workers. 


It is surprising that while work shift has 
been found in several investigations to be 
closely related to a wide range of social, psy- 
chological, and physiological variables (Mott, 
Mann, McLouglin, & Warwick, 1965, Ch. 1), 
it is generally ignored in studies of job at- 
titudes. This study presents data gathered 
from day- and night-shift workers on one 
significant attitude, namely, the perception 
which workers have of the prestige of their 
own and co-workers’ jobs. 

Specifically, it is hypothesized that prestige 
ratings are related to an interaction between 
work shift and the status of the rater, and 
that shift and prestige ratings are related for 
low status (nonsupervisory) workers, with 
night workers attributing less status to jobs 
than day workers, but not for high status 
(supervisory) workers regardless of shift. It 
is theorized that while the job perceptions 
of low status workers are closely related to 
certain situational characteristics of their 
work setting (like their hours of work), those 
of high status workers are relatively self- 
directed and impervious to such situational 
determinants. Hence, since night work is gen- 
erally less prestigious than day work, low 
status night workers might feel that all night 


1 An earlier version of this paper was reported at 
the Eastern Psychological Association, Washington, 
D. C., April 1968. 

2 Requests for reprints should be sent to Ronald 
H. Bohr, Coordinator, Psychosocial Research, Re- 
habilitation Center, Building S-9, Philadelphia, Penn- 
sylvania 19114. 

8 Now at the Internal Revenue Service, Washing- 
ton, D. C. 


workers hold jobs of low prestige. Conversely, 
if the perceptions of high status night em- 
ployees are determined by internalized refer- 
ents (eg., a “cosmopolitan” professional 
identity), they will not differ from their 
counterparts on the day shift. 


MeETHOD 


This hypothesis was tested in two widely differ- 
ent work settings, a state mental hospital and an 
electronics plant, in order to give some indication 
of the generalizability of the findings. Occupational 
status in both settings was defined as the occupancy 
of a supervisory or a nonsupervisory job. In the 
hospital, which operated on a three-shift schedule, 
day workers were contrasted with both evening and 
night workers. The electronic plant had only two 
shifts. In all, 170 workers were employed as Ss. 


Subjects 


State mental hospital. Of the 234 nursing service 
personnel at a 600-bed hospital, 96 completed the 
ratings of job prestige. Of these 96 respondents, 24 
were nurses (65% of all nurses) and 72 were at- 
tendents (37% of all attendants). Half of each 
group worked on the day shift and the other half 
on either the evening or night shift. 

Electronics plant. A random stratified sample of 
74 engineers, foremen, and assembly line workers was 
selected from a total of over 3,000 plant employees. 
The day shift sample included 23 engineers and 
foremen (supervisory personnel) and 23 assembly 
line workers (nonsupervisory personnel). The night 
shift sample included 14 employees in each of the 
two categories. 


4 Thanks are extended to Aaron Smith of Haver- 
ford State Hospital, Haverford, Pa., and to Albert 
Lagore of the Ford-Philco plant, Philadelphia, Pa., 
for their assistance in obtaining respondents. 
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TABLE 1 


ANALYSIS OF VARIANCE OF THE PRESTIGE RATINGS 
or Mentat HosprtaL NursinG PERSONNEL 





Source df MS F 
Shift (A) 1 829.00} 11.95* 
Raters (B) 1 236.00 
Jobs rated (C) 1 | 4116.00} 101.25** 

AXB 1 360.00 Oos 
NG ae 1 11.00 
Bexc 1 94.00 
AXBXC 1 1.00 


Procedure 


The instrument used to determine the prestige of 
various occupations, a modified form of Mishler and 
Tropp’s (1956) status scale, was composed of a 
list of 14 community jobs (with corresponding 
numerical values) arranged in descending order from 
Supreme Court Justice (100) to Garbage Collector 
(50). Respondents indicated for each of several hos- 
pital or factory jobs the community job most 
equivalent in prestige. Hospital personnel rated the 
prestige of Nurse and Attendant; plant employees 
rated the prestige of Industrial Engineer, Foreman, 
and Assembly Worker. 


RESULTS 


Separate analyses of variance were per- 
formed on the ratings made by hospital and 
industrial workers. As Table 1 indicates, 
the 2 X 2 X 2 (Shift X Raters X Jobs Being 
Rated) analysis of the hospital data revealed 
a significant interaction between shift and 
rater, F (1, 92)= 5.19, p < .05. Overall mean 
differences between shifts were computed by 
subtracting night shift means from day shift 
means. Overall mean differences between day 
and night shift for nurses and attendants 
were —.60 and 5.6, respectively. 

Table 2 shows the 2 X 2 X 3 (Shift x 
Raters X Jobs Being Rated) analysis of vari- 
ance for the industrial sample. A significant 
interaction between shift and rater was ob- 
tained, F (1, 70)= 5.08, p< .05. Overall 
mean differences between day and night shift 
workers were —1.00 for engineers and fore- 
men and 3.60 for assemblers. Since the ratings 
of both types of supervisory personnel (engi- 
neers and foremen) were combined, Table 2 
shows one degree of freedom for raters (B) 
but two for the three jobs rated (C). 

In terms of the specific community jobs 


RonaLtp H. Bour AND ARNOLD B, SWERTLOFF 


seen as equivalent, all hospital nurses felt 
they had slightly more prestige than public 
school teachers (M[day]= 87.4; M[night]= 
88.1), and that attendants were similar to 
typists (M[day] = 75.8; M[night] = 76.3). 
Among attendants, however, day workers con- 
sidered themselves more prestigious than elec- 
tricians (M = 83.4), while night workers 
rated themselves just above typists (M = 
77.1). Similarly, while day attendants thought 
nurses resembled airline pilots (M = 91.3), 
night attendants rated them with teachers 
(M = 86.1). In the factory, all supervisory 
personnel thought themselves similar to teach- 
ers (M[day]= 85.3; M[night]= 86.1), and 
considered assemblers comparable to semi- 
skilled workers in an auto factory (M[day] 
= 67.7; M[night]= 68.7). However, while 
night assemblers concurred in this judgment 
(M = 67.0), day assemblers felt they were 
more like store clerks (1M = 70.4). Also, while 
day assemblers viewed supervisors as just be- 
low teachers in prestige (M = 83.6), night 
assemblers rated them equal to electricians 
(M = 80.0). 


DIscussION 


The present results support the hypothesis 
that prestige ratings of low status workers are 
related to their hours of work, while those of 
high status workers are not. In light of the 
concern expressed about shift work (Mott 
et al., 1965; Pearlin, 1962), it seems note- 
worthy to illustrate one way in which it is 
differentially related to job attitudes. 

Although further research is needed, it is 
suggested here that the present results reflect 
a difference in the frame of reference of vari- 


TABLE 2 


ANALYSIS OF VARIANCE OF THE PRESTIGE RATINGS 
or INDUSTRIAL EMPLOYEES 





Source af MS F 
Shift (A) 1 87.00 
Raters (B) 1 206.00 
Jobs rated (C) 2 | 6375.00 | 190.24* 
AXB 1 262.00 5.08* 
Dex © 2 0.00 
BXC 2 322.00 9.61** 
™ O05, 
a3 S38 


PERCEPTION OF JOB PRESTIGE 


ous workers. Indeed, a previous study (Bohr 
& Goldman, 1967) demonstrated that prestige 
ratings were related to the nature of the im- 
mediate work setting for attendants but not 
for upper status mental hospital employees. 
Both investigations indicate that situational 
characteristics of the work environment are 
more potent determinants of the job attitudes 
of lower-echelon than of upper-echelon 
workers. 
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Two thousand twenty-six questionnaires from managers (supervisors) of a 
government agency, Veterans Administration, Department of Medicine and 
Surgery (VA-DM&S), were evaluated and compared with Porter’s (1962) 
Business and Industry (B&I) sample. Satisfaction decreased from top- to 
lower-management levels and the greatest satisfaction deficit at all levels was in 
autonomy and self-actualization for both DM&S and B&I Ss. Dissatisfaction 
for DM&S was markedly greater than for B&I, confirming the study of Paine, 
Carroll, and Leete (1966) who found 95 government managers less satisfied 
than B&I Ss. Government’s lag in the human relations area contrasted with 
B&l’s growing people-centered orientation is offered as a possible explanation 
for the need-satisfaction differences between the two groups. 


The dimensions of the personnel problem 
of the federal government have been of con- 
siderable concern for the past decade. David 
T. Stanley (1964) wrote: 


In the foreseeable future the federal government will 
continue to compete with commerce and industry 
for personnel possessing skills that are scarce 
and will continue to be scarce. . . . The govern- 
ment’s competitive position will be better than it 
was because of the 1964 salary increases and because 


the market .. . is easing a little as government 
procurement is cut down. Nevertheless, trend 
analyses generally show long-term shortages. .. . At 


present it is uncertain whether the educational sys- 
tem can meet these shortages. They will be mitigated, 
but not really solved, by in-service training, outside 
training assignments, and personnel utilization im- 
provements [p. 16]. 


Stanley’s perception of the problem is not 
an isolated one. Numerous committee reports 
and such publications as Executives for Gov- 
ernment (David & Pollock, 1957), The Amer- 
ican Federal Executive (Warner, Van Riper, 
Martin, & Collins, 1963), and The Job of the 
Federal Executive (Bernstein, 1958) identify 
and explain the government’s strengths and 
weaknesses as it competes for talented man- 
power in the 1960s. 


1 Requests for reprints should be sent to Jesse B. 
Rhinehart, Psychology Service, Veterans Adminis- 
tration Hospital, Coatesville, Pennsylvania 19320. 

2At Veterans Administration Hospital, Downey, 
Tilinois. 

83 Now at Veterans Administration Hospital, Hines, 
Tllinois. 

4 Now with Department of Health, Education and 
Welfare, Chevy Chase, Maryland. 


In 1959, Kilpatric, Cummings, and Jen- 
nings (1964) initiated a unique approach to 
the personnel crisis in government. In a large- 
scale survey in which personal interviews were 
conducted with more than 5,000 respondents 
on standardized questionnaires, the research- 
ers attempted to (a) analyze the occupational 
values and attitudes toward work that prevail 
in American society today and (0) ascertain 
the attitudes of various groups in the Ameri- 
can public toward the federal civilian service 
generally and toward the federal service as 
an employer. The most significant single con- 
clusion emerging from the study was this: 
The image of federal employment is markedly 
out of phase with the occupational values of 
those whom the service most needs to at- 
tract to its ranks. 

Shortages of manpower in key occupations 
of the government service, plus the indifferent 
image of federal employment noted in the 
preceding research citation, served to motivate 
a second major research project, the results 
of which were published under the title The 
Higher Civil Service (Stanley, 1964). Origi- 
nally proposed as an evaluation of “rank-in- 
job” versus “rank-in-man’ principles, this 
study of 16,000 federal executives broadened 
into a study of policies, procedures, and in- 
stitutional concepts for the purpose of achiev- 
ing a realistic assessment of the government’s 
use of higher scientific, professional, and man- 
agerial employees. In one segment of the 
study, a small number (80) of former federal 
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employees were asked why they had resigned. 
When the results were analyzed, it was found 
that 34% reported leaving for new oppor- 
tunities and new kinds of work experiences 
and 20% reported leaving because of dis- 
satisfaction with programs, policies, col- 
leagues, and frustrations. It can be assumed, 
in short, that more than one-half of these 
former employees left their government posi- 
tions because of failure to find their jobs 
satisfying. Herein may reside the core of the 
federal manpower problem. 

Porter (1962) employed a questionnaire 
utilizing a modified Maslow-type (1954) 
categorization of needs to determine how man- 
agers in business and industry feel about 
their jobs. Subsequently, similar investiga- 
tions have been extended to the military 
(Porter and Mitchell, 1967) and to union of- 
ficials (Miller, 1966). To date, only one study 
(Paine, Carroll, & Leete, 1966) using Porter’s 
technique has assessed the job satisfaction of 
civil service personnel. When the job satisfac- 
tions of 95 managers in a government agency 
were compared with the job satisfactions of 
Porter’s group from business and industry, 
the former evidenced less satisfaction in all 
need items included in the questionnaire. 

The present study, the first of six projected, 
proposes to determine how managers (super- 
visors) of the Department of Medicine and 
Surgery of the Veterans Administration com- 
pare with their counterparts in business and 
industry (Porter, 1962) in terms of the extent 
to which five need areas (Security, Social, 
Esteem, Autonomy, Self-Actualization) are 
met by their jobs. The resultant information, 
it is believed, can serve ultimately to reduce 
the manpower problem of one segment of a 
federal agency by providing a base for pro- 
gram innovations which will (a) decelerate 
personnel attrition through resignations and 
(6) increase the job performance and satis- 
factions of managers, with concomitant im- 
provement in the productivity of their sub- 
ordinates. 


METHOD 
Procedure and Sample 


Porter’s (1961) questionnaire, the data sheet of 
which was revised to fit the agency position struc- 
ture more appropriately, was distributed to super- 
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visory personnel in 148 hospitals, domiciliaries, and 
outpatient clinics through personnel officials desig- 
nated as coordinators for this project. The number 
of questionnaires mailed was based upon estimates 
provided by each coordinator of the number of 
supervisory personnel at his station eligible to par- 
ticipate in the study. Those eligible to participate 
were supervisors drawn from the three classes of 
positions in the Department of Medicine and Sur- 
gery (DM&S) of the agency: (a) Title 38, including 
physicians, dentists, and nurses in the medical pro- 
fessions, (b) General Schedule, comprising profes- 
sional, semi-professional, administrative, technical 
and clerical personnel, nonmedical professional and 
white collar positions, and (c) Wage Administration 
Supervisors, consisting of skilled and unskilled em- 
ployees in trades, crafts, maintenance, and house- 
keeping. 

To elicit cooperation and assure anonymity for 
each participant, the coordinators were asked to as- 
semble the potential participants in groups as large 
as were feasible, to explain to them the nature of 
the study and its potential value, and to request 
their cooperation. Further, they were to provide an 
opportunity to answer any questions about the study, 
and to distribute packets of materials. The individual 
packets included the questionnaire and a letter from 
the principal investigator with instructions for com- 
pleting the questionnaire and directions for mailing 
it in an attached, franked envelope addressed to the 
Psychology Department of a cooperating university 
where the envelope was to be opened and destroyed 
to eliminate postmark identification. The question- 
naires were then mailed, in bulk, to the principal 
investigator. 

This procedure, geared to permissive, anonymous 
participation, resulted in a return of 9,841 question- 
naires of 16,293 mailed to the local coordinator, or 
60%. For replication purposes, however, only the 
first 2,026 questionnaires were employed for com- 
parison with Porter’s (1962) sample of 1,916 from 
business and industry. All of the completed question- 
naires will be utilized for future, extensive, intra- 
agency studies. 

For the 2,026 questionnaires constituting the 
DMg&sS data of this study, subdivision into four man- 
agement levels seemed most appropriate. Therefore, 
for comparison purposes, it was necessary to con- 
vert Porter’s five management levels to four by 
combining his two top levels into one, designated 
here as Top Management and consisting of the 114 
cases in his President category (19%) and the 611 
cases in his Vice-president category (81%). The re- 
sulting organization of the two groups of data into 
four management levels is shown in Table 1. 

Some characteristics of the B&I data and DM&S 
data and the distribution of totals by four manage- 
ment levels and by four age levels is shown in 
Table 2. 

Comparison of the two groups indicates that 
DM&S respondents are older at all management 
levels, and, except for Top Management, have lower 
educational levels. 
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TABLE 1 


CoMpARtsON OF THE FouR MANAGEMENT LEVELS IN 
THE BUSINESS AND INDUSTRY AND DEPARTMENT 
Or MEDICINE AND SuRGERY Dara 











Department of 
medicine and 


Management 
| surgery 


NegKie Business and industry 
Presidents 
Vice-presidents 


Directors 

Assistant directors 

Chiefs of staff 
Division managers Chiefs 
Plant managers Assistant chiefs of 
Department managers divisions and 
services 


Top 





Upper-middle 


All between lower- 
and upper-mid- 
dle, primarily 
section chiefs 


Lower-middle | Approximate level of 
department and sub- 
department managers 





Virst-or second-level Unit chiefs 


supervisors 


Lower 


The Questionnaire and Its Scoring 


Only one segment of Porter’s questionnaire, the 
rationale of which has been described elsewhere 
(Porter, 1961) provided data for this study. The 
results reported here are based on responses to 13 
items relevant to a Maslow-type classification and 
organized according to their prepotency into five 
types of needs: security, social, esteem, autonomy, 
and self-actualization. For each of the 13 specific 
items (eg., “the feeling of esteem’) respondents 
were asked to indicate on a 1-7 rating scale: (a) 
THlow much is there now? (6) How much should 
there be? 

The amount of need satisfaction experienced in his 
management position by each DM&S respondent for 
each of the 13 items was determined by subtracting 
his response to Part a of the item from his re- 
sponse to Part b of the item. Individual scores were 
then averaged on all 13 items for each of the four 
management levels and reported in terms of the five 
need categories. Either a sign test or signed-rank 
test was used to test for statistical significance 
(Siegel, 1956). 
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RESULTS 


Table 3 compares the need satisfaction of 
B&I supervisors or managers on the basis of 
level of position. It should be noted that, fol- 
lowing the example set by Paine, Carroll, and 
Leete (1966), in Table 3 the term “average 
satisfaction” is substituted for Porter’s “per- 
ceived deficiencies in need fulfillment” with 
high numbers indicating less satisfaction. 

Porter (1962) concluded that the more 
satisfied managers tended to cluster at the 
highest management level and that satisfac- 
tion tended to decrease at each successive 
lower level of management. When this con- 
clusion is evaluated on the basis of Porter’s 
data as presented in Table 3, the projected 
pattern of decreasing need satisfaction is ap- 
parent; a similar pattern prevails for the 
DM&sS data. The differences constituting the 
patterns are significant at the .01 level by 
signed-rank tests. 

From Table 3 it may be observed that 
DM&S managers are consistently more dis- 
satisfied than B&I managers at the top three 
management levels. These differences were 
found to be statistically significant at the .01 
level for the top and upper-middle levels and 
at the .001 level for the lower-middle groups. 
When the two lower management levels are 
compared, this greater dissatisfaction for 
DM&S is not demonstrated. It is at this 
point in the management hierarchy, it should 
be noted, that the demographic data reveal 
the greatest disparity in educational back- 
grounds with 76.2% and 18%, respectively, 


TABLE 2 


Comparison oF DM&S witn B&I iw Terms or Disrrrsutron or VN or Torat Sampie 
BY Four MANAGEMENT Levets AND Four AGr Groups, AND CHARACTERISTICS 


or SAMPLE BY MANAGEMENT LEVEL 











Age grou 
eee Total V Median arr 
y for level age 
SPE Ra ee enEC or 35-44 45-54 55+ (%) 

B&l|DM &S|B&I|DM &S|B &I |DM &S} B&I|DM &S|B &I |DM &S|/B&I |DM&S| B&T |DM&S 
Top 59 QO | 271 11263 58 | 132 59 | 725] 128 | 46.0} 54.2 | 74.0] 85.0 
Upper-middle 95 33 288 | 165 | 206} 298 70 172 | 659] 668 | 43.1] 49.6 | 75.0] 71.0 
Lower-middle 100 43 208 | 173 98 | 281 25 94 | 431} 591 | 40.6] 47.8 | 75.0] 56.0 
Lower cally 46] 201 14} 308 9 73 | 101} 639 |39.2] 47.0 | 76.2] 18.0 
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TABLE 3 


COMPARISON OF B&I anp DM&S In Terms oF AVERAGE NEED SATISFACTION SCORES 
FOR EacH NEED CATEGORY AND FouR MANAGEMENT LEVELS 


Management level 








Need category Top Upper-middle Lower-middle Lower 
B&l DM&S B&l DM&S B&l DM&S B&l DM&S 
Security .40 .63 .40 so 38 56 .80 1.32 
Social 30 .53 33 48 20 Hail! 61 .70 
Esteem 43 .55 .66 .82 iil .88 1.21 1.04 
Autonomy .50 84 87 .98 95 1.24 1.47 1.32 
Self-actualization 87 .99 1.12 1.26 aE 1.47 1.70 1.53 
M of 13 items 09 74 16 90 81 1.04 1.28 1.21 


Note.—The larger the number, the less the need satisfaction. 


for the B&I and DM&sS lower level super- 
visors. This suggests the possibility that the 
greater dissatisfaction of these B&I man- 
agers in comparison with their DM&S coun- 
terparts may be a function of their higher 
educational level which could be indicative 
of higher levels of aspiration. 

When both population samples were evalu- 
ated to determine their relative average satis- 
faction, rankings (from most to least de- 
ficient) for the five need areas revealed cer- 
tain similarities. For both groups Self-Ac- 
tualization and Autonomy ranked first and 
second; Social ranked fifth. However, B&I 
ranked Esteem third with Security fourth; 
the reverse ranking occurred for DM&S. 

Though the patterns of relative perceived 
need deficiencies for both groups were simi- 
lar, in all of the five need areas and for all 
management areas, the degree of dissatisfac- 
tion was significantly greater for DM&S (p 
< .01). When assessed in terms of the five 
individual need categories, the greater dis- 
satisfaction of DM&S supervisors was found 
to be statistically significant for the Social 
and Self-Actualization needs, both at the .01 
level. When the comparison is confined to the 
three top management levels, for both Self- 
Actualization and Autonomy the differences 
between B&I and DM&S are significant at the 
05 level. For all other category comparisons 
for all four levels of management as well as 
for the top three levels, differences between 
B&l and DM&S were in the direction of 


greater dissatisfaction for DM&S supervisors, 
but these differences were not found to be 
statistically significant. 

In a comparative study of need satisfac- 
tions in military and business hierarchies, 
Porter and Mitchell (1967), using three man- 
agement levels, concluded that the influence 
of military rank on need satisfaction may be 
greater than the effect of echelon level in 
business organizations. When the ranges of 
average satisfaction (for the total of all 13 
items) from the highest to the lowest of the 
top three B&I and DM&S management levels 
are observed in Table 3, it is apparent that 
the range for DM&S (.74-1.04) closely ap- 
proximates that for B&I (.53-.81), suggest- 
ing that the effect of echelon level is about 
the same for both these governmental and 
nongovernmental supervisors. 


CONCLUSIONS 


When managers (supervisors) from the De- 
partment of Medicine and Surgery of the 
Veterans Administration are compared with 
managers from business and industry, several 
conclusions are apparent: 

First, both groups show positive relation- 
ships between vertical location in the man- 
agement hierarchy and need satisfaction, with 
satisfaction decreasing as the management 
scale is descended. 

Second, for both groups the two highest 
order needs, Autonomy and Self-Actualiza- 
tion, are less well satisfied at all management 
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levels. However, for the government super- 
visors the Security need is less well satisfied 
than Esteem, while the reverse is true for 
the managers from business and industry. 

Third, although the position-level-satisfac- 
tion patterns and need deficits were similar, 
the degree of dissatisfaction of the govern- 
ment group was markedly greater when as- 
sessed in terms of the over-all needs and all 
four management levels. When the top three 
groups are compared level-to-level, the DM&S 
supervisors reveal more dissatisfaction for 
each management segment. However, this find- 
ing does not hold for the lower level super- 
visors. 

Fourth, in contrast to the military, ap- 
parently the effect of echelon level on satis- 
faction is about the same for both the govern- 
mental and nongovernmental groups studied 
here. 

Fifth, the percentage of college graduates 
was higher for the business managers at the 
three lowest levels but higher for the govern- 
ment managers at the highest level, suggest- 
ing that a college degree is a greater requisite 
for reaching the top in government than in 
business. 

Sixth, at each level, the government man- 
agers were older than their counterparts in 
business. Whether this means that employees 
entered government service at a later age or 
required longer service before promotion to 
a supervisory role materialized could not be 
determined. 

In this and Porter’s study, it will be re- 
called that job satisfaction was quantified by 
subtracting the participants’ assessment of 
“reality” (How much is there now?) from his 
“expectation” (How much should there be?). 
Dissatisfaction, therefore, is, in part, a func- 
tion of level of expectation. In short, the 
norms for government managers and business 
managers could be different, as pointed out 
by Paine, Carroll, and Leete (1966) in their 
study, the findings of which “. . . indicated 
that government agency managers had much 
less need satisfaction than private industry 
managers similar to them in age and or- 
ganizational level [p. 249].” These writers 
also suggested that studies of other govern- 
ment agencies would be needed to confirm 
their findings. The present study, employing 
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a much larger number of respondents (2,026 
as compared to 95), certainly seems to have 
produced sufficient confirmation of their find- 
ing. 

The answer to the question of why govern- 
ment managers are less satisfied than man- 
agers in private business is doubtless so com- 
plex that many aspects of the work climate 
must be taken into account. However, one 
possible facet may reside in the failure of 
government to emulate industry’s increasing 
concern with human relations factors in the 
work situation, a concern initiated by the 
famed Hawthorne Studies at the Western 
Electric plant in Chicago. These studies, com- 
pleted in 1927, represented the first honest 
and concerted effort to understand employees, 
instead of approaching the problem solely 
from the managerial point of view of im- 
proving efficiency on an economic level. For 
the past 35-40 yr., industry has made in- 
creasing use of the behavioral sciences in its 
slow but definite movement from rule-cen- 
tered and work-centered orientations to peo- 
ple-centered and group-centered orientations. 
Some substantiation of government’s lag in 
the human relations area is to be found in a 
1968 unpublished report of a questionnaire 
survey to which 108 federal executives re- 
sponded, indicating their preferences for vari- 
ous seminar topics: 

The results seem to indicate that, on the 
whole, respondents disliked topics and general 
areas associated with the behavioral sciences 
and with the psychological growth of the 
executive (Rhinehart, 1968). 

Perhaps an increased concern for human 
relations could contribute to increasing satis- 
faction among government managers. It would 
seem logical that increased manager-satisfac- 
tion would result in increased satisfaction 
among those they supervise. 
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BEYOND PARKINSON’S LAW: 


III. THE EFFECT OF PROTRACTIVE AND CONTRACTIVE 
DISTRACTIONS ON THE WASTING OF TIME 
ON SUBSEQUENT TASKS? 


DAVID LANDY, KATHLEEN McCUE, anp ELLIOT ARONSON 2 


University of Texas 


It has been demonstrated previously that persons who, because of an “ac- 
cident,” are allowed more time than is necessary to perform an initial task 
will, of their own accord, spend a greater amount of time on a subsequent 
similar task than persons who are allowed a minimum amount of time to per- 
form the initial task. The present experiment replicates this excess time effect 
with the amount of time Ss spend working on the initial task being manipulated 
by means of a distracting confederate. During performance of the initial 
task, a confederate imposed on one-half of Ss a contractive distraction (one 
relatively decreasing work time) and on one-half of Ss a protractive distrac- 
tion (one relatively increasing work time). In a subsequent work situation, Ss 
performed a similar task in the absence of the confederate and were allowed 
to work as long as they chose. The Ss in the Protractive Distraction condition 
spent significantly more time working on the initial task and significantly more 
time on the second task than did Ss in the Contractive Distraction condition. 
This not only demonstrates mediation of the excess time effect by distraction, 
but also eliminates a possible artifact in the previous experiments—that E’s 
instructions conveyed differential time norms to Ss and were thus responsible 


for the effect. 


In an experimental demonstration of Park- 
inson’s Law that work expands to fill the time 
available, Aronson and Landy (1967) allowed 
Ss either 5 or 15 min. to perform a task which 
could be completed in 5 min. The Ss who were 
allowed the extra time spent a significantly 
greater amount of time actually working on 
the task than those who were allowed mini- 
mum time. In addition, Ss were subsequently 
presented with a similar task and allowed to 
work at their own pace. Again, Ss who were 
allowed excess time on the initial task spent 
significantly more time performing this sec- 
ond task than those who were allowed mini- 
mum time on the initial task. This latter phe- 
nomenon, termed “the excess time effect” had 
previously been demonstrated by Aronson and 
Gerard (1966). 


1 This experiment was supported by a grant from 
the National Institute of Mental Health (MH 12357) 
to Elliot Aronson. The authors would like to thank 
Ira Levy, who served as the confederate throughout 
the experiment. 

2 Requests for reprints should be sent to Elliot 
Aronson, Department of Psychology, University of 
Texas, Austin, Texas 78712. 


The present experiment is an attempt to 
eliminate a possible artifact which might have 
accounted for the excess time effect in the 
Aronson-Gerard and Aronson—Landy experi- 
ments. In both of these experiments the al- 
location of time was manipulated in the fol- 
lowing manner: After EZ had explained the 
first task to S, the departmental secretary 
burst into the experimental room and urgently 
reminded £ that he had promised to help a 
professor set up some apparatus, and that it 
would require about 5 min. (in the Minimum 
Time condition) or 15 min. (in the Excess 
Time condition). After some hesitation, E 
agreed to help. He suggested that S work on 
the task during E’s absence, and as he left 
the room, £ said that he would be back in 
5 (or 15) min. In describing these studies, 
the investigators reasoned that, since the time 
alloted for the tasks was made to appear ac- 
cidental and arbitrary, it could not have re- 
flected E’s judgment of how much time he 
himself thought the task should consume. 
However, it is possible that, despite this pro- 
cedure, differential time norms may have been 
conveyed by E as to what constituted an ap- 
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propriate time to spend on the given task (see 
Gerard, 1967). Specifically, in the Five-Min- 
ute condition in the Aronson—Gerard (1966) 
and Aronson—Landy (1967) experiments, by 
stating that he would return in 5 min., E may 
have implied that he expected S to be finished 
within 5 min. In the present experiment, this 
possibility is eliminated in the following man- 
ner: All Ss are allowed excess time (15 min.) 
to perform the initial task; the amount of 
time which Ss actually spend working on the 
task is manipulated through the actions of a 
distracting confederate. 

Distraction is defined as any alternative 
activity which competes for a worker’s at- 
tention during the time which he has at his 
disposal to complete a specific task. Distrac- 
tion can be classified into two categories: pro- 
tractive distractions and contractive distrac- 
tions. Protractive distractions tend to diffuse 
or stretch out the amount of time a person 
will spend working on a given task; they 
divert attention from the given task without 
completely halting the work process. Contrac- 
tive distractions have a relatively opposite 
effect, tending to decrease or concentrate the 
amount of time a person will spend working 
on a task. This latter type of distraction is 
one which is clearly incompatible with work- 
ing on the given task, that is, a worker is un- 
able to perform his assigned task and simul- 
taneously attend to the distraction. Therefore, 
he tends to work harder on the assigned task 
so that he will finish it more quickly, enabling 
him to engage in or attend to the distracting 
activity. By behaving in a predetermined 
manner, a confederate should be able to im- 
pose on Ss either a protractive or contractive 
distraction and thus manipulate the amount 
of time Ss will spend actually working in the 
initial task. 

The Aronson-Gerard and Aronson—Landy 
experiments demonstrate that the amount of 
excess time an individual spends working on 
a task in one situation affects the amount of 
time he will spend working on a similar task 
on subsequent occasions. Thus, it was pre- 
dicted that workers exposed to a protractive 
distraction in an initial work situation will 
spend more time at a subsequent similar task 
in which there is no distraction than workers 
who are exposed to a contractive distraction 
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in the initial work situation. Even in a situa- 
tion where excess time to complete a task is 
provided to two groups of workers, those who 
are confronted with a protractive distraction 
will spend more time completing a subsequent 
similar task than those who are confronted 
with a contractive distraction during per- 
formance of the first task. In effect, then, the 
present experiment is a conceptual replica- 
tion of the Aronson—Gerard and Aronson— 
Landy experiments. It is designed to demon- 
strate more clearly that the excess time ef- 
fect is not merely an artifact of norms im- 
plied by the experimental instructions, but is 
a function of the amount of time consumed in 
the initial performance of the task. 


METHOD 


The Ss were 42 male undergraduates ® who were 
required to participate in psychological research in 
order to accumulate credits required in their intro- 
ductory psychology class. Individual Ss were ran- 
domly assigned to one of the two experimental con- 
ditions, that is, either Protractive Distraction or 
Contractive Distraction. 

The E met S and the confederate (posing as an- 
other S) in a departmental office and introduced 
herself as Dr. Aronson’s assistant. She led them both 
to an experimental cubicle in which there was a 
desk and three chairs. When S and the confederate 
were both seated, E apologized for the crowded con- 
ditions and explained that usually each S was alone 
in a separate cubicle, but that all of the other 
cubicles were in use at that moment. She said that 
she hoped that another cubicle would be available 
soon. 

At this point the nature of the task was intro- 
duced. The E informed S and the confederate that 
they were not participating in an actual experiment; 
rather, they would be helping E to prepare some 
demonstration materials for Dr. Aronson’s social psy- 
chology class. She explained that these demonstra- 
tions involved interpersonal perception. She also told 
them that they would receive full experimental credit 
for their participation, even though, strictly speak- 
ing, it was not an actual experiment. She then gave 
both S and the confederate three envelopes, each 
containing a set of eight photographs. The three sets 
of pictures given to S consisted of ordinary photo- 
graphs of men. The pictures given to the confederate, 
on the other hand, were all of women, some of 
whom were partially nude or in provocative poses. 
The E explained that the task was to rank the first 
set of photographs according to S’s own perception 


3 Two Ss in the Protractive Distraction condition 
were discarded: One because he left the experimental 
room during the first work period, and the other be- 
cause he had previously met the confederate and was 
suspicious about the distraction. 
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of the intelligence of the people in the photographs, 
the second set in terms of perceived warmth, and 
the third set according to perceived honesty. 

At this point, in the interests of strengthening 
credibility, the confederate asked a question of 
clarification: “You mean all we do is look at these 
cards and put them in correct order?” The E 
nodded, while assuring Ss that there were no ‘right’ 
or ‘wrong’ answers. 

The experiment had been planned so that at this 
time another confederate would knock on the cubicle 
door. He told E that she was needed to help set up 
some apparatus and that it would “Only take about 
15 minutes.” After some hesitation, EZ said that she 
would be right there. She then apologized to S and 
confederate for the interruption. She explained that 
she had to leave but suggested that they work on 
the task while she was gone. As she left the room, 
she said, “I’ll be back in about 15 minutes.” 

The confederate then exposed S to one of the two 
experimental manipulations (distractions). The con- 
federate had randomly assigned S$ to a given condi- 
tion before he had met S and without E’s knowledge. 

Contractive Distraction condition. In this condi- 
tion the confederate allowed S to work on his task 
for 1 min. and then commented, “Hey, are all yours 
girls?” He then showed S one of the nudes. All Ss 
expressed a desire to see the rest of the confederate’s 
pictures, but the confederate told S that he should 
hurry up and finish his rankings and then he could 
look over the nude photographs at his leisure. Twice 
more during the task the confederate urged S to 
hurry. When S had finished ranking all three sets 
of photographs, the confederate allowed S to see 
the pictures he had been ranking. He then engaged 
S in a conversation to keep him from going back to 
recheck his own task. 

Protractive Distraction condition. In this condition, 
the confederate allowed Ss to work without inter- 
ruption for about 1 min. He then began to distract 
S by shoving one of his pictures in front of him, 
giggling, and making the following comments: “How 
can you rank a nude girl on intelligence?” “Would 
I ever like to snuggle up with this one!’’ “What a 
build!” “Say let me see your pictures,” etc. These 
interruptions were made at intervals of 15-30 sec. 
In both the Contractive and Protractive Distraction 
conditions the confederate finished his own rankings 
just after S. 

During this procedure E had entered an adjacent 
cubicle. A peep-hole device enabled her to measure 
the amount of time Ss actually spent working on the 
task and the distribution of time spent on the task 
over the 15-min. period during which E was absent. 
When 15 min. had elapsed, E returned to the ex- 
perimental cubicle and again apologized for the previ- 
ous interruption. She then asked S and the confeder- 
ate to read off the numbers on the back of the photo- 
graphs so that she could record their rankings. This 
was done simply to uphold the cover story. After 
recording the rankings, she said that she had man- 
aged to find another vacant cubicle so that each 
person could work in his own room. She handed S$ 
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another three sets of photographs (these were similar 
in nature to the first three sets ranked by S) and 
explained that he was to rank the first set on in- 
telligence, the second on warmth, and the third on 
honesty, as he had done with the previous photo- 
graphs. She then requested that the confederate come 
with her, saying that he, too, would work on a 
similar task in the newly freed cubicle. As an after- 
thought, she turned back to S and said that after 
she had directed the confederate to the newly freed 
cubicle, she wanted to finish setting up the apparatus 
she had been working on. She assured S that she 
would be just down the hall and suggested that he 
simply knock loudly on the door of his experimental 
cubicle when he had finished ranking the pictures; 
she would come down to record his rankings. The 
£ and confederate then left the room. The E ac- 
tivated a stop-watch and again entered the adjacent 
cubicle. When S knocked on the door, E recorded 
the elapsed time. She then returned to the experi- 
mental cubicle and recorded S’s rankings. The S was 
then interviewed. After satisfying herself that S was 
not suspicious about the role of the confederate or 
about the true purpose of the experiment, E fully 
explained the experiment to S, the nature of the 
deception, and the necessity for employing it. 


RESULTS AND DISCUSSION 


Before looking at the primary data, it would 
be prudent to examine our check on the ma- 
nipulations. In the initial work period during 
which the confederate manipulated the dis- 
traction variable, it was assumed that an S$ 
was actually working on the task if he was 
observed to be looking at the photographs 
which he was instructed to rank. In the initial 
15-min. work period the 20 Ss in the Contrac- 
tive Distraction condition spent a mean time 
of 217 sec. (approximately 34 min.) ac- 
tually working on the task while the 20 Ss 
in the Protractive Distraction condition spent 
a mean time of 395 sec. (approximately 64 
min.) actually working on the task. This dif- 
ference is significant at beyond the .001 level 
of probability (F = 42.25; df= 1, 39) and 
indicates that the manipulation was effective; 
that is, Protractive Ss consumed reliably more 
time than Contractive Ss. 

It will be recalled that during the per- 
formance of the second task, Ss were allowed 
(a) to work at their own pace; that is, for as 
much time as they wished to spend on the 
task, and (6) to work by themselves; that is, 
in the absence of the confederate. It was pre- 
dicted that under these circumstances Ss who 
had been exposed to a protractive distraction 
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during their performance of the initial task 
would spend more time working on the sec- 
ond task than Ss who had been exposed to a 
contractive distraction in the initial work 
situation. The data clearly support this pre- 
diction. The Ss in the Protractive Distraction 
condition spent a mean time of 279 sec. (more 
than 43 min.) working on the second task 
while Ss in the Contractive Distraction condi- 
tion spent a mean time of 210 sec. (34 min.) 
working on the same second task under identi- 
cal conditions. An analysis of variance showed 
this difference to be significant at beyond the 
025 level (F = 5.44; df = 1, 39).4 

Since all Ss were allowed the same amount 
of excess time in which to perform the initial 
task (15 min.), and since £ did not know to 
which condition Ss had been assigned until 
after she left the room, it is impossible for 
the excess time effect to have been an artifact 
of demands or norms implied by E. Because 
these data are parallel to the results obtained 
by Aronson and Gerard (1966) and Aronson 
and Landy (1967) using a totally different 
manipulation, our confidence is increased re- 
garding the “realness” and generality of the 
effect. 

The availability of excess time is a condi- 
tion existing in many work situations. One of 


4 When the analysis was performed on speed scores, 
that is, on reciprocals of time in seconds multiplied 
by 100 (a procedure which is recommended by Miller 
[1959, p. 283] to compensate for the frequently 
skewed distribution of time measures), an F of 4.15 
was obtained (df = 1, 38; p< .05). 
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the important determinants of just how much 
of the available excess time will be utilized 
to work on the task at hand will be the pres- 
ence of distractions in the work situation. The 
present experiment demonstrates that dis- 
tractions in one work situation, in which a 
person is provided with excess time to per- 
form a task, will affect the amount of time 
that he will spend working on a subsequent 
similar task in which there is no distraction 
and in which the person works at his own 
pace. Whether the amount of time spent on 
the subsequent task tends to increase or de- 
crease will depend on the nature of the dis- 
traction to which the worker is exposed in the 
initial work situation. Protractive distractions 
in the initial work situation tend to increase 
the amount of time spent on a similar task in 
a subsequent situation while contractive dis- 
tractions tend to decrease it. 
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Measures of six traits of creative ability were examined by the multitrait— 
multimethod matrix. Structured tests, interviews, and supervisory ratings were 
gathered from 63 scientists and engineers in a research laboratory. There was 
no evidence of convergent and discriminant validity for the measures of crea- 
tive ability, although two control traits in the matrix—Job Involvement and 
Time Extension—exhibited substantial validity. Implications of the findings 


were discussed. 


The purpose of this study is to assess the 
convergent and discriminant validity of six 
different measures of creative ability by the 
multitrait-multimethod technique (Campbell 
& Fiske, 1959). This technique is primarily 
concerned with the adequacy of tests as 
measures of a construct. It provides informa- 
tion for three important issues: (a) whether 
the trait can be observed under more than one 
experimental condition, (b) whether the trait 
can be meaningfully differentiated from other 
traits, and (c) how much of the variation 
between traits can be attributed to character- 
istics of the trait versus the measure of these 
traits. 

Three methods were used to measure the 
multiple traits: a paper and pencil test, a 
structured interview, and a paired comparison 
rating procedure. 

The creative ability traits were Sensitivity 
to Problems, Remote Association, Originality, 
Ideational Fluency, Spontaneous Flexibility, 
and Semantic Redefinition. They were se- 
lected on the basis of their relevance in the 
scientific research process and the empirical 
evidence supporting their validity (Guilford, 
1959, 1967 [pp. 162-166]; Mednick, 1962). 
The selection of the structured test for each 
trait was based on the degree of empirical 
validity evidenced in an adult working popu- 
lation and its face validity for the population 
in our study (Guilford, 1967, Ch. 6; Med- 
nick, 1967). 


1 Requests for reprints should be sent to Paul S. 
Goodman, Graduate School of Business, University 
of Chicago, 5836 Greenwood Avenue, Chicago, II- 
linois 60637, 
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MeETHOD 
Sample 


Seventy-eight employees in a government research 
laboratory voluntarily participated in the study. 
Complete test, interview, and ratings were available 
for only 63 individuals and only these were used in 
the present analysis. 

Fifty-six percent of the individuals used in the 
analysis were involved in basic research, 43% in 
engineering problems, and 1% in the administrative 
area. Fifty-four percent of the sample had com- 
pleted or were in the process of completing work on 
their PhD degrees, 35% possessed Master’s or 
Bachelor’s degrees, and the remaining had at least 
some high school education. The average age of the 
group was 38; the range was 24-61 yr. 


Instruments 


The following is a brief description of the se- 
lected traits of creativity and the respective tests: 
Remote Association (RAT) was measured by the 
Remote Associates Test.2 The test is based on Med- 
nick’s (1962) conceptualization of creativity as a 
process of combining elements (preferably diverse) 
into new and useful combinations. Sensitivity to 
Problems (SP)—the ability te recognize problems— 
was measured by the Seeing Problems Test (Guil- 
ford, 1967, p. 106). Ideational Fluency (IF)—the 
ability to call up ideas wherein quantity, not quality, 
is emphasized (Guilford, 1967, pp. 142-143)—was 
measured by the number of acceptable responses 
made to the A. C. Test of Creative Ability. Spon- 
taneous Flexibility (SF)—the ability to produce 
divergent responses (Guilford, 1967, p. 143)—was 
also measured by the A. C. Test. The scoring for 
this dimension focused upon the number of different 
classes of responses rather than the total number of 
responses. Originality (O)—the ability to produce 
remote, uncommon, or clever responses (Guilford, 
1967, pp. 153—158)—was operationalized by the rela- 


2 Scores on all test: were arranged to exhibit posi- 
tive relationships. 
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TABLE 1 


MULTITRAIT-MULTIMETHOD MATRIX OF SELECTED CREATIVITY MEASURES 
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tive infrequency of a response to the A. C. Test. The 
last dimension, Semantic Redefinition (SR)—the 
ability to shift the function of an object or particu- 
lar object and use it in a new way (Guilford, 1967, 
p. 181)—-was measured by the Object Synthesis Test. 

Two control traits—Job Involvement and Time 
Extension—were introduced because they were con- 
sidered independent of the creative ability traits, a 
requirement in the multitrait-multimethod matrix. 
Job Involvement (JI) refers to the degree to which 
an individual is psychologically identified with his 
work, Time Extension (TE) refers to the length of 
future time span which is conceptualized. The test 
measure of Job Involvement was developed by 
Lodahl and Kejner (1965), Time Extension by Good- 
man (1966). Both measures are Likert-type scales. 

In the interview the respondent was presented 
with a series of statements describing the traits under 
examination and then was asked to describe himself 
in terms of these statements along a 9-pt. scale. The 
rating procedure used the same stems as the inter- 
view. The rater’s task was to compare pairs of 
individuals in terms of the dimension under con- 
sideration. 

Reliability estimates for the tests were determined 
by the split-half procedure adjusted by the Spear- 
man-Brown formula. Since it was not possible to 
reinterview the same population, a retest coefficient 
was determined from a heterogeneous population: 
three student groups (V=6; N=11; N=9), a 
group of industrial chemists (V=17), and a group 
of social scientists employed at a research center 
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(N =9). The median rank order retest coefficient is 
probably understated because the range of responses 
was more constricted in this population and it was 
administered on a group rather than individual basis. 
The reliability estimate for the paired comparison is 
an average consistency score for all raters for a 
given dimension. 


Administration Procedures 


The tests were administered in a group session 
which lasted approximately 3 hr. The interview time 
was approximately 14 hr. and followed from 2-3 
wk. after the test administration. In the rating, super- 
visors were given a deck of cards containing all 
possible pairs of individuals they agreed to rate for 
each trait. The approximate rating time, 45 min., 
followed from 4 to 6 wk. after the test administration. 


RESULTS 


Data for the multitrait-multimethod matrix 
is presented in Table 1. Convergent validity 
is indicated in the heteromethod-heterotrait 
blocks by the values in the diagonal (e.g., 
RAT, RATz2); discriminant validity is in- 
dicated to the degree a diagonal value ex- 
ceeds its corresponding row and column values 
(e.g., RAT; JI,). Since the matrix requires 
comparison between independent traits, this 
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analysis focuses on comparisons between each 
creative ability trait and the control traits, 
Job Involvement and Time Extension. 

No convergent validity for Remote As- 
sociation appears in the test-interview and 
rating-interview blocks. The .27 correlation 
in the test-rating block does not provide sig- 
nificant evidence for convergent validity be- 
cause of the halo effect operating in the 
rating measure.? Examination of the other 
creativity measures indicates no evidence of 
convergent and discriminant validation. Only 
the control measures—Job Involvement and 
Time Extension—exhibit substantial conver- 
gent and discriminant validity. 

Method variance can be illustrated by 
examining the test and rating heterotrait— 
monomethod (solid) triangles. For example, 
consider the degree of association between 
Problem Identification and Ideational Flu- 
ency. The correlation for the same measure 
of these different traits is greater than the 
correlation between two different measures of 
the same trait. These findings indicate that 
the nature of the measurement process must 
be contributing to the relationship between 
the two traits. 

Two types of method variance seem to ap- 
pear. First, since the tests for Problem Identi- 
fication and Ideational Fluency require list- 
ing responses, the nature of the instrument 
probably accounts for some of the associations 
between both traits. Second, the high associa- 
tion among all the different traits in the rat- 
ing triangle indicates a halo effect. 


DIscussION 


Lack of substantial convergent and dis- 
criminant validity for the selected measures 
of creative abilities raises serious questions 
about the nature of the tests. Before any 
implications are drawn, alternative explana- 
tions for the results in the matrix should be 
considered. 

First, an incorrect matrix design would 
mitigate any useful interpretations of the re- 
sults. However, the above matrix does follow 
the Campbell and Fiske (1959) specifications 
of independent methods and traits (as indi- 

3 The “halo” is indicated by the high correlations 


among ratings of hypothesized independent traits 
(ie. RAT, Job Involvement, and Time Extension). 
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cated by the low values in the test and inter- 
view heterotrait-monomethod triangle). Also, 
the convergent and discriminant validities ex- 
hibited in the control trait measures seem to 
indicate the matrix design is adequate. 

Second, the low values may be attributed to 
distribution or instrument factors. Each dis- 
tribution was examined for skewness and non- 
linearity. In cases where another statistical 
technique was more appropriate, a comparison 
was made with the values presented in the 
above matrix; in general there was little dif- 
ference in magnitude or relative relationships. 
The reliability of the instruments seems ade- 
quate and not a major contributor to the low 
matrix values. Also, the low intercorrelations 
in the solid triangles for tests and interviews 
suggest that the problem of social desirability 
is not a contributor to the observed results. 
The halo effect in the ratings, which cancelled 
the usefulness of the rating—-test interviewer 
comparison, seems more a function of lack of 
knowledge about the ratee* than some in- 
herent characteristic of the rating technique 
or some characteristic of its administration 
(e.g., a relationship with the length of rating 
time and observed halo). 

It may be concluded, then, that the low 
convergent and discriminant validity of the 
selected tests of creative ability is not a func- 
tion of inappropriate application of statistical 
techniques or of instrument error, but of the 
operational differences in the tests. This lack 
of convergence between different measures of 
the same trait raises some question as to the 
operational meaning of the tests. The implica- 
tions of these findings for future research on 
creativity may be summarized as follows: It 
will be necessary to extend the use of the 
multitrait-multimethod matrix to different 
tests for the traits under consideration, to 
different methods, and in different populations, 
before other types of validation work are 
undertaken. 

4 Raters: participated in the interviews and data 
from the interviews suggests the raters understood 
the traits in terms of their own behavior and dis- 


criminated between the creative traits and control 
traits. 
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ASYMMETRICAL TRANSFER IN READING TEXTS 


PRODUCED BY TELEPRINTER AND 
BY TYPEWRITER’ 


E. C. POULTON 2 
Applied Psychology Research Unit, Cambridge, England 


Sixty good readers and 66 poorer readers were given 90 sec. to read passages 
of about 450 words. They had then to answer 10 open-ended questions on the 
content. The passages and questions were reproduced in Siemens teleprinter 
capitals with triple spacing, and in IBM 72 elite upper and lowercase letters 
with single spacing. Length of lines averaged 6.5 in. After reading two passages 
in one style, two-thirds of the readers were transferred to a passage in the other 
style. The remaining one-third read one style throughout. The poorer readers 
found elite easier to comprehend than Siemens (~ < .05). They showed positive 
transfer when reading Siemens after elite (p < .05), and negative transfer when 
reading elite after Siemens (p < .05). The good readers did not find elite any 
easier than Siemens, and showed no reliable transfer effects. The product moment 
correlation between the scores on the first two passages was only .32, because 
the rate of comprehension depends upon previous knowledge. After correcting 
for attenuation, the correlation between the pooled rate of comprehension of 
the first two passages and Part 2 of the Tinker speed of reading test was .84. 


Offices may handle both teleprinted and 
typewritten material. At present teleprinters 
still print only in capitals. The letters of the 
Siemens teleprinter have a height of 8 points 
(2.7 mm.), and are typed in succession 10 
letters to the inch. The minimal vertical spac- 
ing between lines (which corresponds to lead- 
ing) is 4 points (1.3 mm.). Sometimes double 
or even treble spacing is used between lines. 
This results in only three or two lines of 
teleprint per inch of paper. 

In contrast, most typewritten material is 
typed in lowercase letters, with capitals only 
at the start of sentences and for proper names. 
The letters of elite typewriters have an x- 
height (the height of the rounded parts of the 
letters, excluding the ascenders and descend- 


1 The problem of transfer was raised by M. A. G. 
Howgate of the British Government Communica- 
tions Headquarters, who supplied the Siemens tele- 
printed material used in the experiment. The pas- 
sages were written and pretested by C. H. Brock. 
Experimental Ss of the requisite reading abilities 
were provided by A. J. Hull. P. M. E. Altham ad- 
vised on the design of the experiment and on the 
analysis of the results. K. Tayler ran the correla- 
tions. Financial support from the British Medical 
Research Council is also gratefully acknowledged. 

2 Requests for reprints should be sent to the 
author, Medical Research Council, Applied Psychol- 
ogy Research Unit, 15 Chaucer Road, Cambridge, 
England, 


ers) of only 5 points (1.7 mm.), although the 
total letter height is 9 points (3.0 mm.). The 
letters are typed in succession 12 letters to 
the inch. The usual single spacing gives a 
vertical separation between lines (correspond- 
ing to leading) of only 3 points (1.0 mm.). 
This makes 6 lines of typewriting per inch 
of paper. 

There can thus be quite a large difference 
between teleprinted and typewritten material. 
In offices which handle quantities of both 
kinds of material, complaints have been made 
that single-spaced elite type is too small to 
read comfortably. These complaints could not 
have been predicted from the results of previ- 
ous experiments. Tinker and Paterson (1928) 
and Tinker (1955) found that normally 
printed lowercase texts were easier to read 
than texts printed all in capitals. Poulton and 
Brown (1968, Table 2) found that texts type- 
written in elite type were easier to read thap 
texts teleprinted in Siemens capitals, or typed 
in elite or pica capitals. The experimental 
findings that lowercase texts are the easier to 
read conflicts with the complaints that elite 
type is too small. It suggests that the com- 
plaints may be based upon transfer effects. 
People who have been reading for most of the 
day material teleprinted in large capital let- 
ters with wide spacing between lines may 
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show negative transfer when given material 
to read typed in single-spaced elite letters. 

There is some evidence which suggests that 
this could happen. Fox (1963) compared ma- 
terial typed in standard elite with the same 
material typed in the small-sized capitals of 
Gothic elite. When read first, the material 
typed in standard elite was read 18% more 
quickly than the material typed in Gothic. 
But when read second, after reading material 
typed in the other style, the material in the 
small-sized Gothic capitals was read 20% 
more quickly than the material in the stan- 
dard elite. If not due to chance differences 
between the two groups of readers, this is a 
two-way asymmetrical transfer effect (Poulton 
& Freeman, 1966). Texts in small capitals 
were read more quickly after reading texts 
in standard elite letters. This is the usual 
positive transfer or practice effect. Texts 
in standard elite were read more slowly 
after reading texts in Gothic capitals. This is 
an unexpected negative transfer effect, which 
corresponds in direction to the complaints 
just described. 

The present experiment was designed to in- 
vestigate this problem. Control groups stuck 
throughout to a single condition, either Sie- 
mens teleprinted material or material typed 
in elite (see Table 1). Experimental groups 
practiced on one kind of material, and were 
then transferred to the other kind. To de- 
termine whether negative transfer was due 
simply to lack of familiarity with material in 
a particular style, half of the readers in each 
experimental group were given to read first 
a passage in the style they were eventually to 
be transferred to. The other half of the readers 
met the style for the first time during the ex- 
periment in the transfer condition. 


METHOD 
Materials 


Three passages, each of about 450 words, were 
written on aspects of Roman life: farming, soldier- 
ing, and social activity. Sentences were kept short 
and to the point. Each passage was reproduced twice 
in lines which averaged 6.5 in. in length. It was 
teleprinted with triple spacing, using the Siemens all- 
capitals typeface. And it was typed single-spaced 
using an IBM 72 elite electric typewriter. Both re- 
productions were on 4-ply teleprinter paper rolls 
which have carbon paper interleaved. The top copies, 
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which were on tracing paper, were used to make the 
dyeline copies for the experiment. 

There were 10 open-ended questions on each pas- 
sage to test for comprehension. A question required 
only a few words to answer. The 10 questions were 
spread evenly over the text, so that a person who 
had read 80% of the passage would be able to at- 
tempt 8 of the 10 questions. After each question 
there was a line of dots (again triple- or single- 
spaced) on which the reader had to write his answer. 
The question sheets were reproduced exactly like the 
passages: teleprinted with triple spacing and typed 
in single-spaced elite on 4-ply teleprinter paper. The 
top copies on tracing paper were used for photo 
copying by the offset process. 


Experimental Design and Subjects 


The experimental design is shown in Table 1. It 
can be regarded as an experiment on six groups, each 
of 10 good readers, which has been repeated on six 
groups each of 11 poor readers. A group of good 
readers was paired with a group of poor readers. 
Each such pair was treated differently. Previews 1 
and 2 involved reading passages without being tested 
for comprehension. 

The two elite control groups had elite passages 
throughout. There were two pairs of elite transfer 
groups. The prewarned groups had an elite passage 
in Preview 1, while the unwarned groups did not. 
Subsequently all four groups had Siemens passages 
until the final Test 3, when they were given an elite 
passage to read. The Siemens control groups and the 
prewarned and unwarned Siemens transfer groups 
had the corresponding conditions. Everyone read the 
three test passages in the same order. Thus the rela- 
tive difficulty of the three passages is confounded 
with practice effects. 

There were 60 good readers who had all scored 
50% or more in a previous experiment of a similar 
nature. There were also 66 poor readers who had 
all failed to score as much as 50% previously. Ten 
good readers and 11 poor readers were allocated to 
each of the six experimental conditions. Allocation 
was in order of arrival, except for the restriction 
that each group of good readers had the same number 
of very good and good readers. Each group of poor 
readers had the same number of poor and very poor 
readers. 

All 126 readers were members of a panel main- 
tained at the Applied Psychology Research Unit in 
Cambridge. Just under one-fourth were men. Their 
ages ranged from 24 to 73 yr. About half wore read- 
ing glasses for the experiment. They were paid 7s. 6d. 
per hr. (about $.90) for their services plus traveling 
expenses. 


Procedure 


Readers allocated to different conditions were 
tested simultaneously in groups. Those reading elite 
passages in Tests 1 and 2 were seated separately 
from those reading Siemens passages. Thus after 
Preview 1 nobody saw passages in a style other 
than the style which he was reading until transfer 
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TABLE 1 


EXPERIMENTAL DESIGN AND MEAN PERCENT COMPREHENSION 
FOR GoopD AND Poor READERS 











All readers 


Experimental 


ae Preview 1| Preview 2| Test 1 
condition 
Elite 
Control Elite Elite Elite 
§28 
Transfer prewarned | Elite Siemens | Siemens 
51 
Transfer unwarned | Siemens | Siemens | Siemens 
62 
Siemens 
Control Siemens | Siemens | Siemens 
61 
Transfer prewarned | Siemens | Elite Elite 
59 
Transfer unwarned | Elite Elite Elite 
55 
All conditions Elite 
combined 55 
Siemens 
58 


a Test 3—Tests 1 and 2 combined » < .05 or better. 








Groups of 10 good readers Groups of 11 poor readers 

Test 2 Test 3 Test 1 Test 2 Test 3 
Elite Elite Elite Elite Elite 

655 468 47 47 50be 
Siemens | Elite Siemens | Siemens | Elite 

61 50 40 48 33> 
Siemens | Elite Siemens | Siemens | Elite 

64 47 37 45 34> 
Siemens | Siemens | Siemens | Siemens | Siemens 

694 504 414 415 208 cef 
Elite Siemens | Elite Elite Siemens 

68 58 45 53 BBE 
Elite Siemens | Elite Elite Siemens 

67 56 45 55 30! 
Elite Elite Elite Elite Elite 

67 48 45 51 39 
Siemens | Siemens | Siemens | Siemens | Siemens 

65 55 40 45 29 











b Elite control—Elite transfer, prewarned and unwarned, p < .05, 


¢ Elite control—Siemens control p < .001. 
4 Test 3—Test 2 p < .01 
e Siemens control—Siemens transfer prewarned p < 


f Siemens control—Siemens transfer unwarned p < ‘Oe (one- tailed test). 


& Elite—Siemens p < .05 on Tests 1 and 2 combined. 


in Test 3. The experiment was introduced as a 
comparison of teleprinted and typewritten texts. 
Beyond this the readers were told nothing of the 
aims of the experiment. 

In the two preliminary preview conditions pas- 
sages of about 500 words were read for 2 min. The 
readers were told that this was merely to acquaint 
them with the style of print; they would not be 
questioned on the content. In each test a test passage 
of about 450 words was studied for 90 sec. Four 
minutes were allowed subsequently for answering the 
10 questions, but most readers did not require as 
long as this. The experiment lasted about 30 min. 

After the experiment everyone was given 10 min. 
on Part II of the Tinker Speed of Reading Test 
(Tinker, 1955). 


RESULTS 


The results are given in Table 1. The 
groups of 10 good readers are in the middle. 
The groups of 11 poor readers are on the 
right. The most difficult of the three passages 
was chosen for the transfer test, Test 3. This 
is indicated by the results of three out of the 
four control groups. The top row of the table 


gives the mean comprehension scores of the 
two groups which read elite type throughout. 
The 10 good readers in the middle of the 
table did reliably worse on Test 3 than their 
average on Tests 1 and 2 (p < .05 on a two- 
tailed Wilcoxon test, Siegel, 1956, p. 75). The 
11 poor readers on the right of the table did 
no worse on Test 3 than on Tests 1 and 2. 

The fourth row of the table gives the cor- 
responding data for the two groups which 
read Siemens throughout. Here it was the 
poor readers on the right of the table who 
did reliably worse on Test 3 (p< .01). For 
the good readers in the middle of the table, 
Test 3 was reliably worse than Test 2 (p< 
.01), but it was not reliably worse than the 
average of Tests 1 and 2 (p > .05). 

The good readers showed no reliable differ- 
ence between elite and Siemens, and no re- 
liable transfer effects on Test 3. The key re- 
sults concern the poor readers. The data from 
the poor readers were subjected to analysis of 
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variance. After removing the differences be- 
tween individuals and the differences among 
the three tests (p < .001), the residual in the 
3 X 6 matrix given on the right of Table 1 
was found to be reliable at the .02 level. Thus 
there must be reliable differences in the table 
related to the Siemens and elite lettering, or 
to the orders of reading the Siemens and 
elite, or to both the lettering and the orders. 

The bottom two rows on the right of the 
table show that the poor readers scored higher 
on the elite passages than on the Siemens. 
For Tests 1 and 2 combined, the difference in 
favor of elite was reliable at the .05 level on 
a two-tailed Mann-Whitney U test (Siegel, 
1956, p. 116). This is a simple comparison be- 
tween groups, and is not contaminated by 
transfer effects. 

The right-hand side of the top row of the 
table gives the mean scores for comprehension 
of the control group of 11 poor readers who 
read exclusively elite texts. They did as well 
on the more difficult third passage as they 
did on the two previous passages. The right 
sides of the next two rows give the mean 
scores of the two groups of poor readers 
which transferred from Siemens to elite on 
Test 3. Both groups did rather less well than 
the control group on Test 1, but even the 
combined difference was not reliable statisti- 
cally (.05 < p< .1). Taken together the two 
groups did reliably better on Test 2 (p< 
.05). Their mean scores for comprehension 
were then about the same as the mean of the 
control group. On Test 3 they transferred 
from Siemens to elite. Both groups then had 
means which were reliably (p< .05) less 
than the mean of the control group. 

The right side of the fourth row in the 
table gives the mean scores for comprehension 
of the control group of poor readers who read 
exclusively Siemens texts. They did reliably 
worse on the difficult third passage than on 
either of the two previous passages (p < 
.01). The right sides of Rows 5 and 6 give the 
means of the two groups of poor readers 
which transferred from elite to Siemens on 
Test 3. They did a little better than the con- 
trol group on Test 1. On Test 2 the two 
groups taken together did reliably better than 
on Test 1 (p< .02). Their means for com- 
prehension were then reliably above the means 
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of the control group (p< .05). On Test 3 
they transferred from elite to Siemens. Both 
groups did reliably better than the control 
group which had read Siemens throughout 
(P< .05 on two-tailed and one-tailed tests, 
respectively ). 

For the good and poor readers combined 
there was a product moment correlation of 
.56 between the pooled results on Tests 1 
and 2, and Part II of the Tinker Speed of 
Reading Test. The correlation between the 
scores on Tests 1 and 2 was only .32. The 
median reliability of the Tinker test is .86 
(Tinker, 1955). After correcting for attenua- 
tion by the Spearman-Brown prophecy for- 
mula (Guilford, 1936, p. 368), the correla- 
tion between rate of comprehension as mea- 
sured here and Tinker’s speed of reading 
measure was .84. 


DISCUSSION 
Asymmetrical Transfer 


Table 1 shows that the poor readers read 
Test 3 in elite type reliably less effectively 
after reading Tests 1 and 2 in Siemens than 
they did after reading Tests 1 and 2 in elite. 
Whereas they read Test 3 in Siemens re- 
liably more effectively after reading Tests 1 
and 2 in elite than they did after reading 
Tests 1 and 2 in Siemens. This is a two-way 
asymmetrical transfer effect (Poulton & Free- 
man, 1966). It was shown only by the poor 
readers, who were reliably worse on Siemens 
than on elite texts. The good readers were no 
worse on Siemens than on elite. They did not 
show reliable transfer effects. 

The transfer effect was not due to lack of 
familiarity with the lettering of Test 3. The 
two prewarned groups, who had read a prac- 
tice passage in the same lettering during Pre- 
view 1 (see left of Table 1), showed a two- 
way asymmetrical transfer effect which was 
at least as large as that shown by the two 
unwarned groups. 

A detailed analysis of the last column of 
Table 1 suggests that the asymmetrical trans- 
fer was due partly to the behavior of the 
control groups of poor readers. The elite con- 
trol group did as well on the difficult third 
test passage as on the two previous passages 
and had the best average score for compre- 
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hension, whereas the Siemens control group 
scored only half as much on the difficult third 
test passage as on the two previous passages 
and had the worst score for comprehension. 
Apparently the combination of the difficult 
test passage with the difficult Siemens capitals 
was too much for the poor readers. The large 
and highly significant difference between the 
means of the two control groups was a pre- 
disposing factor in the two-way asymmetrical 
transfer effect. Any mean falling in the gap 
between the two means necessarily showed 
either positive or negative transfer. 

When a balanced factorial design is used, 
the asymmetrical transfer will reduce the ad- 
vantage of elite over Siemens, an effect to be 
expected with poor readers. It could account 
for the smaller differences between all capitals 
and normal upper and lowercase texts found 
by Poulton and Brown (1968) in their Latin- 
square experiment, compared with their ex- 
periment with separate groups. 

Asymmetrical transfer between capitals and 
lowercase may also have occurred among the 
poorer readers in Tinker and Paterson’s 
(1928) experiment. In this case the difference 
of about 12% found in favor of the lowercase 
may underestimate the actual difference in the 
difficulty of reading. Unfortunately the authors 
give only the means for good and poor readers 
combined. The means do not reveal any 
overall asymmetrical transfer. This may be 
because the effect on the poorer readers has 
been masked by the results of the good readers 
who showed no effects. Positive transfer be- 
tween the lowercase and all capitals text may 
have occurred among the poorer readers in 
Tinker’s (1955) repeat experiment—in which 
case the 12% difference found in favor of 
lowercase may again have been an under- 
estimation. 


Variability in the Rate of Comprehension 


The correlation between the rate of com- 
prehension on the first two passages was only 
.32. This may have resulted partly from in- 
dividual differences in the way people reacted 
to the first passage. Everyone had done simi- 
lar tests before, but not for a year or more. 
However the principal cause for the low cor- 
relation was probably differences in familiarity 
with the material read. People memorize most 
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easily what they know already (Poulton, 
1957). One passage or part of a passage may 
contain familiar ideas. It is read quickly and 
gives a high score for comprehension. Another 
passage or part of a passage may be quite 
unfamiliar. It has to be read slowly, or per- 
haps twice, and even then gives a low score 
for comprehension. Thus measures of the rate 
of comprehension necessarily contain a lot of 
within-individual variability. 

The Tinker Speed of Reading Test suffers 
less from variability in the rate of compre- 
hension, because it is more a test of speed 
than of comprehension. The test requires the 
reader to cross out 1 word in about 30. The 
rate of crossing out words with a pencil must 
be reflected in the overall speed score. The 
incongruous word towards the end of each 
30-word item can often be spotted with only 
quite a vague idea of what the item is about. 
Deep comprehension is not required. The em- 
phasis is on speed, both in crossing out words 
and in reading rapidly. This may account for 
the relatively high median reliability of the 
Tinker test, .86. 

However a reading test whose reliability 
is based upon the speed of locating and cross- 
ing out 1 word in 30 is not necessarily repre- 
sentative of normal reading. It is probably 
closer to skimming or scanning (Poulton, 
1967). It seems likely that print which is 
easy to comprehend is also easy to skim or 
scan. After correcting for attenuation, the cor- 
relation between rate of comprehension as 
measured here and by the Tinker test was .84. 
This suggests that the abilities measured by 
the two tests may be similar. But more evi- 
dence is required. The correlation was only 
.56 before it was corrected for attenuation. 
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PREDICTING VOCATIONAL SUCCESS FOR NEUROPSYCHIATRIC 
PATIENTS WITH THE EDWARDS PERSONAL 
PREFERENCE SCHEDULE 


ALLEN GOSS 1 


University of Saint Thomas and Texas Research Institute of Mental Sciences 


This study investigates the possibility of predicting success—fail vocational out- 
come from a personality inventory. The Edwards Personal Preference Schedule 
(EPPS; Edwards, 1959) was administered to a group of 58 psychiatric pa- 
tients at the time of acceptance to a Vocational Rehabilitation Ward. Score 
areas of the EPPS utilized in the predictive model varied with the diagnostic 
categories. The results show that model predictions above the base rate of 
success are possible for all of the diagnostic groups and that predictive ac- 
curacy increases when results are specifically related to particular groups. 


Numerous investigators have used the Ed- 
wards Personal Preference Schedule (EPPS) 
in examining relations between hypothesized 
needs and various behavioral criteria. Work- 
ing in the area of vocational choice, Norrell 
and Grater (1960) reasoned that if the self- 
concept was distorted by a lack of self-aware- 
ness, vocational choice would tend to be inap- 
propriate. High and low self-awareness groups 
were significantly different on Succorance and 
Order needs from the EPPS. Pool (1965), 
using a Veterans Administration hospital pop- 
ulation of patients with whom vocational 
counseling was regarded as either effective 
or ineffective, found that the ineffective group 
had lower need scores for Intraception and 
Endurance and higher need scores for Suc- 
corance and Autonomy. While only part of 
Pool’s results were similar to those of Norrell 
and Grater, they both gave dependency in- 
terpretations, and they both stressed the im- 
portance of psychological needs in making 
vocational choices. 

The focus of the present study was to 
investigate the utility of EPPS need scores in 
differentiating patients accepted to a voca- 
tional rehabilitation ward with respect to 
their subsequent employment outcome. An- 


1JInformation for this study was gathered while 
the author was a predoctoral trainee and a post- 
doctoral fellow at the Veterans Administration Hos- 
pital in Houston, Texas. Requests for reprints should 
be sent to the author, Baylor University, College of 
Medicine, 1200 Moursund Avenue, Houston, Texas 
77025. 


other interest was to explore the usefulness of 
a simple model designed to utilize EPPS score 
information from the analysis, to place pa- 
tients in success or fail groups. 

In line with results from previous investi- 
gations it was hypothesized that those pa- 
tients who gained suitable employment would 
have significantly higher scores on Affiliation, 
Intraception, and Nurturance than those who 
did not gain employment, and a lower Suc- 
corance score. 


PROCEDURE 


Subjects. The Ss were 58 male neuropsychiatric 
patients accepted for the vocational rehabilitation 
program at the Veterans Administration Hospital, 
Houston, Texas. Patients accepted to this ward re- 
ceive vocational counseling, industrial therapy through 
supervised work experience in the hospital, and/or 
educational therapy, as well as job placement as- 
sistance. 

Tests. All Ss were given the EPPS as part of the 
psychological testing just prior or subsequent to 
admission for the rehabilitation program. The EPPS 
was not used in the vocational counseling of the 
patient, nor were the scores readily available to the 
counselors. 

Method. The criterion of employment in the study 
was related to the discharge of the patient. If the 
patient was discharged from the ward with a suit- 
able job, he was considered a success. Employment 
was judged as suitable by the ward vocational 
counselor if the patient had sufficient skills to meet 
the demands of the job and if, while on the ward, 
the patient worked consistently at the job during 
the several week time-period necessary to accumulate 
sufficient funds to be self-supporting. If transferred 
or discharged from the ward in any other category, 
the patient was considered a failure with respect to 
the employment criterion of the ward. Each of the 
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15 EPPS need scores was analyzed with respect to 
the success—failure criterion for the entire sample and 
for the following diagnostic subgroups: (a) alcoholic 
(ALC); (6) anxiety-depression (A-D); (c) patients 
with physical disabilities (P-D); and (d) schizo- 
phrenics (SCH). Results from previous investiga- 
tions with the vocational rehabilitation population 
(Goss, 1966, 1968; Goss & Pate, 1966, 1967) in- 
dicated the need for separate analysis for each of the 
diagnostic groups. The  success-failure diagnostic 
groups had minimal differences with respect to age, 
education, and number of previous hospitalizations. 

Since the second concern of the study was to 
explore the possibility of placing patients in success 
or fail groups which exceed the population base 
rates through the use of a predictive model utilizing 
score information, its operation will be briefly de- 
scribed. Items with large P values were included in 
order to reduce Type 2 (beta) error and to increase 
power. Scores with P values of less than .20 were 
included as predictors; however, weights assigned to 
these scores depended on the P value of the items. 
Items with a P value between O and .10 were 
weighted +2, 0, or —2, depending on the relation 
of the item scores to the mean of the success or fail 
group. If a value was equal to or exceeded the suc- 
cess mean, it was given the positive weight assigned 
that item; if the value was between the success and 
fail mean, it was given a zero; and if the value was 
equal to or below the mean fail value, it was given 
a minus weight. Similarly, weights of 1 were as- 
signed to items with P values between .10 and .20. 
By algebraically summing the positive and negative 
weights of the predictor values, total weight scores 
of a plus, zero, or minus value were derived for each 
individual in the various diagnostic groups, and from 
these values the model predictions were made. Pre- 
dictively, positive values indicated success, negative 
values indicated failure, and zero values registered 
an area of unpredictability. 

It was hypothesized that the successful group 
would have higher scores on Affiliation, Intraception, 
and Nurturance and a lower score on Succorance 
than the failures. The results for the total sample 
indicate that successes generally had a higher score 
on Succorance (S [success]: 64.58, F [failure]: 
47.44; F=5.93, p<.02), and a lower score on 
Deference (S: 44.58, F: 60.56; F=5.08, p< .03). 
Thus, the Succorance score was significant (p < .02) 
in the wrong direction. Affiliation (p<.77), In- 
traception (p< .53), and Nurturance (p< .29) did 
not differentiate the population with respect to the 
employment criterion. Thus EPPS score areas of 
significance in previous vocational studies employing 
self-awareness and counseling-suitability criteria were 
not validated with the employment outcome criterion 
utilized in this investigation. 

Areas which meet the model requirements in- 
clude the four need scores which differentiated the 
ALC group: Deference (S: 44.40, F: 71.25; F =4.55, 
$< .05), Succorance (S: 62.30, F: 34.50; F =5.00, 
p<.04), Abasement (S: 52.50, F: 30.37; F=1.83, 
p< .20), and Nurturance (S: 56.50, F: 35.25; F= 
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TABLE 1 


MopeEtL PREDICTION OF EMPLOYMENT SUCCESS FOR 
PSYCHIATRIC PATIENTS FROM EPPS Scores 








Percent- 
age of Ward 
Noo ur patients Percent-| employ- 
Subgroups patients in model} age ment 
pre- | correct | base 
dicted rate 
for 
ALC 18 94 82 48 
A-D 20 80 81 73 
PED 9 67 83 52 
SCH 11 100 82 54 
Total 58 67 74 58 











2.09, p~<.17). The A-D group contributed three 
need scores: Affiliation (S: 46.77, F: 27.57; F= 
1.79, p<.20), Succorance (S: 60.54, F: 36.14; F 
= 2.90, p< .11), and Heterosexuality (S: 72.77, F: 
46.43; F=6.00, p< .02). The P—D group showed 
the least differentiation with one score providing 
minimal contribution, Nurturance (S: 65.50, F: 
43.40; F=2.33, p<.17). The SCH population 
showed slight differences on Affiliation (S: 39.83, 
F: 67.00; F=2.16, p<.18), and on Aggression (S: 
59.00, F: 18.80; F = 6.82, p< .03). 

An analysis of variance between the scores of the 
four diagnostic groups indicated that there were two 
areas which significantly differentiated these groups: 
Achievement (p< .04) and Order (p< .01). These 
differences were due to the low A-D group mean 
scores, indicating that the lowest need Achievement 
and Order scores occurred in the group with the 
highest percentages of success. 

The model results are presented in Table 1. These 
results show that predictions above the base rates 
of success are possible for all populations and that 
the accuracy and percentage of these predictions in- 
crease when the results are specific to particular pop- 
ulations. From these results the authors conclude 
that attempts to predict vocation outcomes—or any * 
other performance-specific behavior—is a reasonable 
task provided that the authors wish to make predic- 
tions for fairly homogeneous groups and that they 
can gain behavior-specific criterion information. The 
present results were related to the employment out- 
come of patients; they compare very favorably with 
the results of previous investigations. The relation 
between constructs such as dependency, self-aware- 
ness, and effective vocational counseling are clearly 
less than isomorphic to the objective requirements of 
a workaday world. 
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RELATION BETWEEN EMBEDDED FIGURES TEST 
PERFORMANCE AND SIMULATOR BEHAVIOR 


GERALD V. BARRETT 


Management Research Center, University of Rochester 


AnD CARL L. THORNTON anp PATRICK A. CABE 


Goodyear Aerospace Corporation, Akron, Ohio 


Relationships previously found with reaction to an emergency situation and 
simulator sickness compared to the Rod and Frame Test (RFT) measure of 
perceptual style were extended using a second perceptual style measure, an 
embedded figures test (EFT). The RFT was significantly related to emergency 
behavior and to simulator sickness. The EFT was significantly related to 
emergency behavior but not to the simulator sickness. Implications for the use 
of both tests in the prediction of driving behavior are discussed. 


Barrett and Thornton (1968) found cor- 
relations from .55 to .75 between perceptual 
style as measured by Series 3 (Sg) of the Rod 
and Frame Test (RFT) and emergency be- 
havior in an automobile simulator. Witkin, 
Lewis, Hertzman, Machover, Meissner, and 
Wapner (1954) found that there was a rela- 
tionship between Sz; and the Embedded Fig- 
ures Test (EFT). It was possible, therefore, 
that the EFT would also be related to emer- 
gency behavior. Logically, the task of visually 
extracting a geometric pattern from a com- 
plex pattern (EFT) was similar to the emer- 
gency task of detecting a pedestrian (in a 
complex background) moving into the path 
of a vehicle. 

Barrett and Thornton (1968) also found 
that good Ss performance was related to 
simulator sickness. Since in the simulator the 
visual impression of motion on the screen 
was not accompanied by any physical mo- 
tion, sensitivity to visual-kinesthetic conflict 
(RFT) may explain the phenomenon. Also, 
the ability to disembed (EFT) the conflict- 
ing cues may be important. 

Two hypotheses, then, were tested (a) that 
there would be a significant relationship be- 
tween EFT performance and behavior in an 
emergency situation; (b) that there would be 
a significant relationship between EFT per- 
formance and experience of simulator sickness. 


1 Requests for reprints should be sent to Carl L. 
Thornton, Department 459 Plant H, Life Sciences Re- 
search Department, Goodyear Aerospace Corporation, 
1210 Massillon Road, Akron, Ohio 44315. 


MeEtTHOpD 
Subjects 


A random sample of 50 male Ss (aged 30-45) 
were selected from approximately 1200 employees 
in a division of an aerospace corporation. Some Ss 
developed simulator sickness with 26 Ss leaving be- 
fore the emergency trial. The data for 3 Ss was not 
used, 2 because of an error in procedure and 1 be- 
cause he became aware of the purpose of the study. 
20 of 21 Ss were given the RFT 6 mo. later. Six 
months after that, 18 of the 20 Ss remaining were 
able to be retested on the EFT. For the simulator 
sickness comparison, 37 of 46 Ss tested on the RFT 
were available for retesting with the EFT. 


Apparatus 


A standard EFT (Form Cf-1) supplied by the 
Educational Testing Service was used (French, Ek- 
strom, & Price, 1963). The simulator consisted of a 
stationary automobile with a projection screen in 
front of the windshield. The visual scene was ob- 
tained from a television camera mounted on a 
movable track over a scale model highway. The 
camera moved in direct response to the accelerator, 
brake, and steering movements in the automobile. 
In this way S had complete control of changing the 
visual scene. 

Two items of a questionnaire, given to Ss 6 mo. 
after simulator operation, were concerned with sub- 
jective estimates of simulator sickness. The Ss were 
asked to rate the discomfort they experienced while 
operating the simulator and to estimate how long 
after simulator operation that discomfort persisted. 


Procedure 


The EFT was administered in two equal lengths 
(16 items in two 10-min. periods). Score was the 
number of items found in 20 min. The S’s task in 
the emergency behavior study was to respond ap- 
propriately when a human-like dummy emerged sud- 
denly onto the highway. The Ss were unaware that 
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the sequence of events was to occur. Sickness mea- 
sures were obtained by use of a questionnaire ap- 
proximately 6 mo. after the simulator trials. 


RESULTS 


The results of the EFT were compared to 
two emergency criteria, initial brake reaction 
time and deceleration rate. The y = x reac- 
tion time-RFT comparison had yielded a 
Pearson 7 of .61, while the 1/y = « decelera- 
tion rate-RFT comparison yielded an r of .74 
(Barrett & Thornton, 1969). As Thornton 
and Richards (1968) have pointed out, the 
comparison of time and rate scores neces- 
sitates reciprocal data transformations. Since 
the EFT used in the present study has a rate 
score, reciprocals were used. The y = 1/x re- 
action time-EFT correlation was .54 (p< 
.03); the 1/y = 1/x deceleration rate-EFT 
correlation was .49 (p< .05). The RFT- 
1/EFT correlation was .83, Although the 
correlations were lower, the relations were 
much better than those obtained with Series 
1 and 2 of the RFT. 

The highest correlation for the RFT-simu- 
lator sickness comparison had been .33-.55. 
For the EFT comparison they dropped to 
.10—.29. The first hypothesis, then, was con- 
firmed, but not the second. 


DISCUSSION 


Lower correlations between EFT and cri- 
teria scores than were found between S3 and 
the same criteria were the result of a number 
of factors: (@) the time between measure- 
ments was 1 yr. for the EFT and 6 mo. for 
the RFT. In a year’s time perceptual style, 
as measured by the EFT, may have changed. 
(6) The EFT is not as reliable as the RFT 
(Witkin et al., 1954). The RFT is an ex- 
tremely reliable test (r = .95) while over even 
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short periods of time the EFT reliability only 
is moderate (r = .6-.9). Split-half reliability 
for the present sample was only about .58. 
(c) EFT and RFT tap slightly different per- 
ceptual dimensions. While the correlation be- 
tween the RFT series and EFT is usually sig- 
nificant, only 36%-74% of the variance is 
common, It is evident that perceptual style is 
not a unitary construct. (d) The EFT sample 
was smaller than the RFT sample. It is con- 
ceivable that the dropping of several Ss af- 
fected the results. 

Despite these drawbacks it is encouraging 
that a significant relationship was found be- 
tween emergency behavior and EFT per- 
formance. It is suggested that both the RFT 
and EFT be given immediately after gather- 
ing emergency behavior data. Both should be 
related to quickness of response and may be 
combined as powerful tools for prediction of 
driving behavior. Other situations where quick 
detection and recognition are necessary may 
also be studied fruitfully with these two in- 
struments. 
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ANALYZING THE EXPERT JUDGE: 
A DESCRIPTIVE STUDY OF A STOCKBROKER’S DECISION PROCESSES? 


PAUL SLOVIC 2 


Oregon Research Institute 


This study illustrates an analysis-of-variance technique for describing the use 
of information by persons making complex judgments. The Ss were two stock- 
brokers who rated the growth potential of stocks on the basis of 11 factors 
taken from Standard & Poor’s reports. The technique proved capable of pro- 
viding a precise quantitative description of configural and nonconfigural in- 
formation utilization. Each broker exhibited a substantial amount of con- 
figural processing. The technique appears to have promise for providing experts 
with insight into their own processes and for teaching and evaluating “student” 


judges. 


The task of the expert judge, no matter 
what his occupation—military officer, detec- 
tive, businessman, physician, clinical psycholo- 
gist, financial analyst, etc—requires him to 
combine items of information from a number 
of different sources into a decision or judg- 
ment. The key to the expert’s success resides 
in his ability to interpret and integrate in- 
formation appropriately. This means he must 
weigh items of information differentially, ac- 
cording to their relevance, and must be able to 
qualify his interpretations of a given fact 
when other considerations make such qualifica- 
tion necessary. 


1 This research was supported by Grants MH 
04439 and MH 12972 from the United States Public 
Health Service. Computing assistance was obtained 
from the Health Sciences Computing Facility, Uni- 
versity of California, Los Angeles. Portions of this 
work were presented at the meetings of the Western 
Psychological Association, San Diego, March, 1968. 

2The author wishes to thank Terry Ashwill for 
his invaluable assistance in the design of the study 
and for his participation as an S, Robert Kraus for 
serving as the second S, Jerry Solomon and Russel 
Geiseman for their assistance in analyzing the data, 
and Sarah Lichtenstein and Leonard Rorer for their 
comments on the manuscript. 

Requests for reprints should be sent to the author, 
Oregon Research Institute, P. O. Box 3196, Eugene, 
Oregon 97403. 
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There is no need to dwell upon the tre- 
mendous importance of being able to under- 
stand and describe how the expert uses in- 
formation. However, such understanding does 
not come easily. All too often expert judgment 
is regarded as a mysterious, intuitive phe- 
nomenon, incapable of being described pre- 
cisely. For example, Lusted (1960) relates a 
story about a radiologist famed for his diag- 
nostic ability. Once, when he was questioned 
as to why he thought a particular shadow on 
an X-ray was a metastatic lesion, the physi- 
cian replied, ‘“‘Because it looks like it!” At the 
other extreme is the expert who instructs 
others in the art of emulating his judgments 
by reeling off the dozens of factors that he 
takes into consideration, each accompanied by 
an elaborate rationale. Information of this 
sort is quite difficult for the student of ex- 
pertise to use and, in addition, may not ac- 
curately represent what the expert is really 
doing. 

Only in the past 20 yr. has there been any 
extensive study of the judgment process, and 
this study has been primarily within the con- 
text of clinical psychology. The earliest re- 
search efforts focused on the accuracy of 
judgments and the degree to which experts 
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agreed with one another in their evaluations. 
The results of these studies have indicated 
a distressing lack of accuracy and interjudge 
agreement both in medicine (Garland, 1959, 
1960) and in clinical psychology (Goldberg, 
1968). 

As a result of these findings, the emphasis 
has shifted from research on the validity and 
reliability of judgments to attempts to under- 
stand the judgment process itself. This recent 
research aims to “simulate” or “model” the 
hidden cognitive processes of the judge. Hope- 
fully, by understanding these processes it will 
be learned why some judges are more accurate 
than others, and this knowledge will, in turn, 
help us to train persons to make better judg- 
ments. 

Some of the first models for describing 
quantitatively the judgment process were de- 
veloped by Hoffman (1960) and by Ham- 
mond and his associates (Hammond, Hursch, 
& Todd, 1964). While their techniques have 
been quite successful in describing how in- 
dividual items of information are weighted 
and combined by a judge, they have not been 
successful in describing complex patterned or 
configural use of information, that is, the 
process whereby an item of information is in- 
terpreted differently from one time to the 
next, depending on the nature of other avail- 
able information. Since experts generally claim 
that they use information configurally, it is 
important that techniques used to describe 
judgment be sensitive to such processes. 

One technique that analyzes the judgment 
process in all its complexity has been de- 
scribed by Kleinmuntz (1968), who had 
clinical psychologists and neurologists “think 
aloud” into a tape recorder as they made 
diagnostic judgments. Kleinmuntz utilized 
these rich introspective reports to construct a 
computer program simulating the diagnos- 
ticians’ thought processes. The resulting pro- 
grams were complex sequential (e.g., hier- 
archical or “tree’’) representations of the 
diagnosticians’ verbal reports. At the present 
time it is not clear whether the failure of in- 
vestigators other than Kleinmuntz to find ex- 
perimental evidence for configurality stems 
from lack of configurality in the processes 
themselves or from deficiencies in the models 
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and procedures employed to evaluate those 
processes (Goldberg, 1968). 

Hoffman, Slovic, and Rorer (1968) intro- 
duced a technique based on the analysis of 
variance (ANOVA) for the quantitative de- 
scription of both configural and nonconfigural 
use of information in judgment. They em- 
ployed this technique to study the processes 
whereby radiologists diagnose the malignancy 
of gastric ulcers on the basis of roentgeno- 
logical signs. Although the radiologists were 
found to process information configurally in 
many instances, the overall influence of such 
nonlinear processing was slight. Most of the 
variability in the diagnoses could be predicted 
from a linear combination of signs. 

Because the ANOVA technique proved 
quite capable of describing the use of in- 
formation by individual radiologists and be- 
cause it was sensitive to configural processing 
it appeared to merit further use. The purpose 
of the present paper was to test the adequacy 
of the ANOVA technique for describing the 
way that a stockbroker employs information 
as he evaluates the attractiveness of a com- 
pany’s stock. The stock market was selected 
as the domain in which to study expertise for 
several reasons. First, the task of predicting 
the future market price of a security is an 
important one. Hundreds of thousands of 
decisions, involving many millions of dollars, 
are made daily in the market. Secondly, this 
task is interesting because it is extremely dif- 
ficult and complex. There are hundreds of 
factors which may be relevant, some of them 
economic, some of them financial, and some 
of them psychological in nature. In addition, 
introspective reports by financial analysts in- 
dicate that they believe that the relevant 
factors should be interpreted in a complex 
configural manner. For example, many ana- 
lysts claim that one cannot interpret recent 
price changes of a stock without taking into 
account the volume of sales that accompanied 
those changes.® 


METHOD 


Subjects. The Ss were two young brokers. Each 
had about 3 yr. experience with a prominent broker- 


8Since writing this article, two very relevant 
references have been called to the author’s attention. 
These are Clarkson (1962) and Anderson (1969). 
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age firm. While these brokers may, on occasion, 
merely fill a client’s order, they frequently are called 
upon for advice, and in some instances have com- 
plete responsibility for managing a client’s portfolio. 
These men are quite concerned about their ability to 
judge stocks and spend several hours each day study- 
ing the market, attempting to glean information 
from a variety of sources such as newspapers, the 
ticker tape, company reports, financial analysts’ re- 
ports, etc. 

Procedure. The application of ANOVA to the study 
of judgment is simple and direct; first one selects a 
set of presumably relevant factors (i.e., items of in- 
formation or dimensions along which a stimulus can 
be described) and then he constructs stimuli such 
that all possible combinations of these factors are 
represented. When the judgments that are made 
about each of these stimuli are analyzed in terms 
of an ANOVA model, a significant main effect for 
Factor 1 indicates that the judge’s responses varied 
systematically with Factor 1 independent of the 
levels of the other factors. This implies that Factor 
1 was important to the judge. A significant interac- 
tion between Factors 1 and 2 implies that the judge 
was interpreting particular patterns of these factors 
in a configural manner; that is, the interpretation 
of Factor 1 differed as a function of the value taken 
by Factor 2. 

The present task was constructed with the as- 
sistance of Broker A. When asked to list the mini- 
mum number of factors upon which he could com- 
fortably base a recommendation about a stock, 
Broker A selected 11 variables commonly provided 
in Standard & Poor’s reference reports. These variables 
were (a) Yield (YLD). The cash dividend income 
for the past year as a percentage of the market price. 
(b) Near Term Prospects (NTP). A one- or two- 
paragraph forecast concerning sales, profits, dividends, 
earnings, etc., for the coming year. Included is 
pertinent information about new products, political 
or economic factors bearing on the company’s future, 
etc. (c) Earnings Quarterly Trend (EQT). A com- 
parison of quarterly earnings over the past 4-5 yr. 
(d) Past Year’s Performance (PYP). A synopsis 
of relevant statistics for the past year. Includes 
revenues, earnings and dividends, and political and 
economic factors that influenced them. (e) Profit 
Margin Trend (PMT). A yearly comparison indi- 
cating the trend in percentage of profit from com- 
pany operations per sales dollar. Presumably this 
relates to the efficiency with which the company is 
managed and has implications for future earnings. 
(f) Earnings Share Yearly Trend (EYT). (g) Price/ 
Earning Ratio (PER). The ratio of market price to 
net earnings per share over the past 12 mo. (h) 
Shares Outstanding (SO). The number of shares of 
common stock issued by the company. (7) Resistance 
Trend (RES). Trend of a line connecting several 
recent high points on the chart of daily price action. 
(j) Support Trend (SUPP). Trend of a line con- 
necting several recent low points on the daily price 
chart. (k) Sales Volume Trend (VOL). Trend of the 
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Fic. 1. Example of a typical stimulus company. 


number of shares traded per day over a recent 
period of time. 

Next, Broker A was asked whether, in the in- 
terests of simplification, he could still make a rea- 
sonable evaluation of a company’s stock if informa- 
tion about the 11 factors were presented in di- 
chotomous form (e.g., yield being described as either 
high or low, trends as either up or down, etc.). The 
broker said that he could. Further questioning in- 
dicated that there would be no combination of 
these factors so unreasonable as to make the com- 
pany seem unreal and, therefore, impossible to judge. 

The next step involved the construction of hypo- 
thetical companies. Ideally it would have been de- 
sirable to combine the 11 dichotomous factors in all 
possible ways, but in this case that would have re- 
sulted in 2% or 2048 companies, clearly an unman- 
ageable number to judge. However, if one is willing . 
to assume that the higher order interactions are 
negligible, it is possible, by means of a fractional 
replication design (Cochran & Cox, 1957), to evaluate 
the main effects and lower order interactions with a 
considerably reduced number of stimuli. 

Previous work on judgment (Goldberg, 1968) sug- 
gested that the assumption that higher order inter- 
actions would be negligible was not too unreason- 
able. Therefore, hypothetical companies were con- 
structed by combining the levels of the 11 factors ac- 
cording to a 1/16 fractional replication of a 2 
factorial ANOVA design. This produced a set of 128 
companies. This reduction of stimuli results in the 
confounding of main effects and two-way interactions 
with certain of the higher order interactions. Other 
high order interactions serve to estimate the error 
term in the ANOVA. Thus, if configural use of three 
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TABLE 1 


RELATIVE IMPORTANCE OF THE 11 Factors AND THEIR SIGNIFICANT 
INTERACTIONS FOR BROKER A 









































Description of levels) Mean judgment | yy agni- % 
Factor tude of MS Variance 
Level 1 | Level 2 | Level 1 | Level 2| effect* (w*) 
Main effects 
Yield (YLD) Low High 5.56 5.67 li 4 001 
Near Term Prospects (NTP) Poor Good 4.53 6.70 2.17 USIIORS 334 
Earnings Quarterly Trend (EQT) Down Up 4.87 6.36 1.49 70.5% .156 
Past Year’s Performance (PYP) Poor Good 5.40 5.83 43 Seis .013 
Profit Margin Trend (PMT) Down Up 5.44 5.80 36 4.1* .009 
Earnings Yearly Trend (EYT) Down Up 5.56 5.67 mat 4 001 
Price/Earnings Ratio (PER) Poor Good 4.80 6.44 1.64 86.1** .190 
Shares Outstanding (SO) Few Many 5.70 SOS ali 1.0 .002 
Resistance Trend (RES) Down Up 5.50 Seis} 728 1.8 .004 
Support Trend (SUPP) Down | Up 5.39 5.84 AS 6.6** 015 
Sales Volume Trend (VOL) Down Up 5.69 5655) 14 6 001 
Interactions 

Yi <a SUE 36 4:1 .009 
YLD X PMT 39 4.9% O11 
RES X SUPP 48 toe’ 017 
RES X VOL 36 ane .009 
SURESOVOL 39 4,9* 011 
RESDSSURPS VOL 42 Sea .013 

error 1.0 
Sum of effects over the (main effects) 6.54 (73%) Bun 
statistically significant factors (interactions) 2.40 (27% .070 
8.94 787 





* Based on the degree to which the mean judgment changes as the factor changes. 
05. 


ee ® << 101. 


or more factors does occur, the error term will be 
inflated. 

Figure 1 illustrates the way in which information 
about a company was displayed to the brokers. The 
spatial format of the variables was designed to ap- 
proximate the layout of a Standard & Poor’s report 
as closely as possible. The stimuli were bound in a 
notebook which the brokers took home. The brokers 
worked on the judgments in their leisure time over a 
3-wk. period. Broker A reported spending 102 
hr. making his judgments. Broker B spent about 
9 hr. at the task. Although they knew the companies 
were hypothetical, both brokers reported that the 
task was extremely interesting to them and that they 
were able to conjure up images of real companies as 
they read the stimulus information. 

The brokers were asked to make a recommenda- 
tion about each company based on their judgment 
of the likelihood that the market price of that com- 
pany’s stock would increase substantially in the 
next 6-12 mo. The recommendation was made on a 
9-category rating scale where Category 1 was labeled 
“strong recommendation not to buy,” Category 4 
was a “slight recommendation not to buy,” Category 


5 was a “neutral” evaluation, and Categories 6 and 
9 were labeled slight and strong “recommendations 
to buy,” respectively. 


RESULTS 


The mean rating given-the 128 companies 
by Broker A was 5.62 with a standard devia- 
tion of 1.94. Broker B was less favorably in- 
clined towards the companies’ stocks (M = 
3.96) and more variable in his ratings (SD 
= 2.96). 

Despite the fact that Broker B was re- 
cruited as an S by Broker A on the grounds 
that his approach to selecting stocks was rela- 
tively similar to that of Broker A, there was 
rather poor agreement between the two with 
regard to their ratings. The correlation be- 
tween the two brokers’ judgments across the 
128 companies was only .32. 
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TABLE 2 


RELATIVE IMpoRTANCE OF THE 11 Factors AND THEIR SIGNIFICANT 
INTERACTIONS FOR BROKER B 






































Mean judgment ! ; 
Paskor ___| Magnitude MS % Variance 
of effect* (w?) 
Level 1 Level 2 
Main effects 
Yield (YLD) 3.91 4.02 ad 4 .000 
Near Term Prospects (NTP) 3.41 4.52 1.11 40.0** .082 
Earnings Quarterly Trend (EQT) 3.15 4.17 42 Brine O11 
Past Year’s Performance (PYP) 3.89 4.03 14 6 .000 
Profit Margin Trend (PMT) 3.26 4.66 1.40 61.9** 129 
Earnings Yearly Trend (EYT) 2.91 5.02 2.11 142,.4** 299 
Price/Earnings Ratio (PER) 3.12 4.80 1.68 89.4** 187 
Shares Outstanding (SO) 4.03 3.89 14 6 .000 
Resistance Trend (RES) 3.50 4.42 92 Tine 056 
Support Trend (SUPP) 3.61 4.31 70 LS Sam .032 
Sales Volume Trend (VOL) 3.83 4.09 .26 Eo 003 
Interactions 
YLD X PER 30 2.8* 005 
EYT X PER 555 9.6** .019 
BYP XOPMT 505 3.4* .006 
RES X VOL 39 4,9** 009 
SUPP XK VOL 42 Nee O11 
error 0.6 
Sum of effects over the (main effects) 8.34 (81%) 199 
statistically significant factors (interactions) 1.99 (19%) 050 
10.33 849 
a Based on the degree to which the mean judgment changes as the factor changes 
* > <.05. 
*¥D < .01. 


In order to isolate the factors influencing 
the recommendations, a separate ANOVA 
was performed on each broker’s responses. 
Sums of squares and mean squares were com- 
puted for each of the 11 main effects (indi- 
vidual factors), each of the two-way interac- 
tions, and each of the few three-way interac- 
tions that were confounded only with four- 
way or higher order interactions. In addition, 
two indexes of the importance of a factor or 
interaction were computed for each effect. 
One was simply the standard calculation of 
the magnitude of an effect, based upon the 
degree to which the mean judgment shifted 
as the levels of a factor were varied. In this 
regard, the magnitude of a two-way interac- 
tion effect indicates the degree of change in 
the mean judgments as a function of varia- 
tion in the levels of a pair of factors after the 
main effects have been partialed out. The 


second index, called ”, is a function of the 
squared magnitudes of effect and provides an 
estimate of the proportion of the total vari- 
ance in the broker’s judgments that could 
be attributed to a particular main effect or 
interaction (Hays, 1963). 

Tables 1 and 2 present the results of the 
ANOVAs for the two brokers. The ratings 
of Broker A changed significantly with varia- 
tion in the levels of each of six factors (main 
effects), the most influential of these being 
Near Term Prospects, Price/Earnings Ratio, 
and Earnings Quarterly Trend. In addition, 
six interactions were significant, one of these 
(Resistance Trend X Support Trend) being 
the fourth strongest effect. Broker B exhibited 
seven significant main effects, the strongest 
of which were due to the Earnings Yearly 
Trend, Price/Earnings Ratio, and Profit 
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Margin Trend. In addition, five two-way in- 
teractions were significant. 

Since the 11 factors studied here were 
specifically selected by Broker A as the most 
important ones from among a much larger 
set, the fact that his judgments were not in- 
fluenced significantly by a number of these 
factors is especially noteworthy. During the 
process of selecting these factors the broker 
was able to give an elaborate rationale for 
including each one. Perhaps it was too difficult 
for him to use all of the factors simul- 
taneously. 

Summing the w? index over the statistically 
significant factors indicated that about 72% 
of the variance in Broker A’s responses was 
predictable from knowledge of six main effects 
and an additional 7% could be attributed to 
configural use of cues (significant interac- 
tions). Comparable figures for Broker B were 
80% (main effects) and 5% (interactions). 
These percentages could be interpreted as 
evidence for the negligibility of configural cue 
utilization as were the comparable percentages 
found in the study of radiologists by Hoff- 
man, Slovic, and Rorer (1968). However, the 
use of variance percentages as descriptive in- 
dicators may be more meaningful statistically 
than psychologically, and the magnitude of 
effect index, based upon the influence of a 
factor upon the mean judgments, might well 
be a more appropriate gauge for assessing 
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Fic. 2. The relative importance of each factor for 
Brokers A and B. 
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the relative importance of configural effects. 
This index indicates that configurality was 
substantial, accounting for 27% of the total 
effects on Broker A and 19% of the effects on 
Broker B. Even this is a conservative estimate 
of the degree of configurality. Extrapolating 
from the excellent discussions of linear and 
configural models presented by Green (1968) 
and Hayes (1968), one could argue that 
whenever the interaction between two fac- 
tors was significant, those factors were being 
used configurally and the variance accounted 
for by both their main effects and their inter- 
action should be counted as configural vari- 
ance. Following this rule would boost the per- 
centage of configural variance to 36% for 
Broker A and 85% for Broker B. Additional 
evidence for the argument that meaningful 
configural information processing was taking 
place here is the fact that two interactions 
(RES X VOL and SUPP X VOL) were com- 
mon to both brokers. Detailed analysis of 
these interactions showed each of them to 
be almost identical in form for the two 
brokers. 

An index of the overall importance of a 
given factor was calculated by summing the 
magnitude of effect index for the main effect 
of that factor with the magnitude of effect 
indexes of all significant interactions contain- 
ing that factor. The summed effect of a given 
factor was divided by the sum of the effects 
of all factors. This index of importance was 
thus a percentage score where the sum of all 
percentages totaled 100. 

Figure 2 illustrates the relative importance 
of the 11 factors for each broker based on 
the index just described. Despite the fact that 
the brokers viewed themselves as similar in 
orientation, there was a considerable differ- 
ence in their use of information. These differ- 
ences undoubtedly indicate why they dis- 
agreed so often in their rating of a particular 
stock. Broker A considers himself to be a 
“technical-analyst” (i.e., one who weights in- 
formation from price and volume charts espe- 
cially heavily) and in this regard it is note- 
worthy that the ANOVA model showed him 
to be using the three chart variables, Re- 
sistance, Support, and Volume Trends, to a 
greater extent than did Broker B, who ap- 
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Fic. 3. Comparison between the strength of effect 
index, subjective weight, and w* for Broker A. 


pears to be more of a “fundamentalist” (i.e., 
one who relies on traditional balance sheet 
and income indicators). 

Validity of subjective weights. How closely 
would the brokers’ subjective impressions of 
the relative importance of the 11 factors con- 
form to the importance indexes calculated 
from the ANOVA model? To provide an 
answer to this question, each broker was 
asked, after completing his ratings, to dis- 
tribute 100 points over the 11 factors pro- 
portionally to his feelings about their im- 
portance in determining his judgments. These 
subjective weightings were compared with the 
magnitude of effect indexes pictured in Figure 
2 and with the w? index, the latter also being 
combined over both main effects and interac- 
tions and normed to sum to 100 over the 11 fac- 
tors. The results of this comparison are de- 
picted in Figures 3 and 4. They show that the 
subjective weightings of Broker A were ex- 
tremely close to the magnitude of effect index 
while Broker B had less accurate insight into 
his use of the various factors. The w? index 
was very discrepant from the subjective 
weights of both brokers. This index tended 
to exaggerate the differences between the most 
important factors and the lesser ones. To the 
extent that one feels that expert judges should 
have some insight about their own weighting 
system, this result implies that the magnitude 
of effect index is a better measure of a factor’s 
relative importance than the w? index. 

Analysis of interactions. The finding of a 
significant main effect or interaction is only 
a first step in understanding how a judge 
interprets information. It should be viewed 
as a signal that something interesting is going 
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Fic. 4. Comparison between the strength of effect 
index, subjective weight, and w* for Broker B. 
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Fic. 5. Graphical representation of selected interaction 
effects for Broker A, 
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on. Graphical representation of an effect fol- 
lowed by interrogation of the judge concern- 
ing the rationale behind his behavior can be 
used to further one’s understanding of the 
effect. To illustrate, three of the significant 
interactions found in the judgments of Broker 
A are pictured in Figure 5. Broker A was 
shown these figures and was asked to provide 
an explanation for each one. A paraphrased 
version of his explanation for each effect 
follows. 

1. YLD X PMT effect. Why is high yield 
a more favorable indicator than low yield 
when PMT is down while the reverse is true 
when PMT is up?—Because when PMT is 
down, earnings probably are down, and ac- 
cordingly the price of the stock should de- 
cline. A low dividend yield would make the 
stock even less attractive while a high yield 
would tend to compensate for the poor earn- 
ings prognosis. When PMT is up, earnings are 
probably up and the outlook for price 
appreciation is good. A quality company 
whose earnings portend good growth doesn’t 
usually offer a large dividend, so low yield in 
conjunction with a rising PMT suggests that 
the stock has a very promising future. A high 
yield in this case suggests that the company 
is probably not putting enough of its capital 
into growth or perhaps that the outlook for 
future price appreciation is not really so 
promising, hence the need for a larger dividend 
to make the stock attractive to the investor. 

2. EYT X SUPP effect. Why should a ris- 
ing trend in yearly earnings be a better sign 
than a declining earnings trend when the sup- 
port trend (price) is down while the reverse 
is true when the support trend is up?>—When 
both support and earnings trends are down, 
the stock has nothing going for it. But if the 
support trend is down despite the fact that 
the earnings are going up, the market may be 
generally bad and this may be a good time 
to buy the stock. In contrast, when the sup- 
port and earnings trends are both rising, the 
stock may have already made its move and 
thus may be overpriced, while a rising support 
trend in conjunction with declining earnings 
may indicate that the smart money knows the 
earnings will be up next year and the stock 
may be a very good buy, 
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3. VOL X SUPP effect. Why is rising vol- 
ume viewed as a favorable indicator when it 
occurs in conjunction with a rising support 
level and viewed as a relatively unfavorable 
sign when it occurs with a stock whose price 
is declining?—A stock that is declining on 
relatively low volume is considered to be 
strong. People have enough confidence in it 
to hang on to it, rather than sell. If price 
declines on high volume the story is different. 
Everyone is selling and the prospects are 
thought to be very poor. Similarly, if volume 
is down on a stock that has been appreciating 
in price, confidence in that stock’s future must 
be low, in contrast with a stock that is rising 
because many people are buying it (high 
volume). 


Discussion 


The results of the present study indicate 
that the ANOVA technique has considerable 
promise as a device for describing and fur- 
thering the understanding of complex judg- 
ment processes. It is likely that this tech- 
nique can provide even the expert with new 
insight into his inferential processes. Further- 
more it might also be a valuable teaching 
device that would enable “trainees” to see 
exactly how their own processes differ from 
that of their expert model (see Todd & Ham- 
mond, 1965, for a related idea). Imagine the 
difficulty of asking the expert to describe his 
judgment processes in detail, obtaining a 
series of descriptive paragraphs such as those 
given above to describe interactions, and then 
trying to fit all these together in a way that 
would enable you to emulate his judgments. 
The task is extremely difficult if not impos- 
sible, yet this is a common way in which 
expertise is communicated. However, such in- 
trospective comments become considerably 
more helpful when they are accompanied by 
the precise quantitative descriptions provided 
by the ANOVA technique. 

The present results are important in an- 
other way. They provide experimental evi- 
dence to support the commonly believed no- 
tion that judges use information configurally. 
The results of previous studies, most of which 
used less direct methods to infer the im- 
portance of configural processes, have led a 


STOCKBROKER’S DECISION PROCESSES 


number of workers to assert that humans 
are predominantly linear information proc- 
essors (see discussions of this issue by Hoff- 
man, 1968, and Goldberg, 1968). It is now 
clear that substantial configural processing of 
information does occur and can readily be 
detected by the ANOVA technique. 
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This paper attempts a replication of a study by Mehrabian (1965) who ex- 
tended studies of speech behavior to apply to letters of recommendation. His 
Ss were asked to write two letters, both positive. For the first letter they 
were asked to assume strong liking for the person they were describing; for 
the second letter they were to assume strong dislike for the person being 
described. Mehrabian’s Ss wrote significantly more words in the first letters. 
Our replication fully confirms his results. These findings suggest that the written 
channel of communication may be as sensitive a mirror of S’s underlying 
attitudinal state as earlier research had revealed was the case with the spoken 


channel of communication. 


Earlier studies reviewed in Matarazzo, 
Wiens, and Saslow (1965, pp. 203-204) have 
indicated that such interviewer tactics as 
head-nodding, saying Mm-Hmm, increasing 
his own utterance lengths, and related social 
reinforcers are perceived by an interviewee as 
“sreater interest in the interviewee on the 
part of the interviewer.” This perception, in 
turn, increases S’s “level of satisfaction in 
this interpersonal encounter.” This resulting 
motivational state was hypothesized as a pre- 
liminary conceptual framework within which 
to understand our often cross-validated finding 
that such disparate interviewer tactics were 
empirically, and predictably, followed by 
marked increases in the interviewee’s own 
average length of utterance. In an interesting 
extension of these findings, Mehrabian (1965) 
tested for related phenomena of a communi- 
cator’s underlying state in a study involving 
the written, in contrast to the spoken, channel 
of communication. He reasoned that number 


1 This research was carried out in conjunction with 
support from the Office of Scientific Research, Office 
of Aerospace Research, United States Air Force, 
under AFOSR Grant Number XG-3057. Thomas S. 
Manaugh received National Institutes of Health fel- 
lowship support from NIGMS Training Grant Num- 
ber GM 1495-02. 

2 Requests for reprints should be sent to Authur 
N. Wiens or Joseph D. Matarazzo, Department of 
Medical Psychology, University of Oregon Medical 
School, 3181 S.W. Sam Jackson Park Road, Port- 
land, Oregon 97201. 
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of words written in a letter of recommenda- 
tion might be more explicit as an index of 
a writer’s true attitude toward the person 
about whom he was writing than was the 
content of the letter per se. His results, based 
on actual letters written by 69 college stu- 
dent Ss under two instruction-induced at- 
titudinal sets (Like versus Dislike for the 
person being recommended) clearly revealed 
that a communicator writes more words about 
a person whom he likes. 

In the past several years continuing re- 
search on interviewee speech behavior like- 
wise has shifted its emphasis toward more 
explicit and direct concern with an S’s under- 
lying motivational and attitudinal state. Early 
results from this new research direction 
(Manaugh, Wiens, & Matarazzo, in press; 
Matarazzo, Wiens, Manaugh, & Jackson, in 
press) strongly suggest that discussion 
of some content areas elicits evidence of 
motivational states that are more salient 
than are those tapped by still other content 
areas. Whether such a motivationally salient 
state is endogenously present or exogenously 
induced in the interview, our own accumulat- 
ing evidence is that speech behavior during an 
interview appears to mirror it successfully. 
The publication of Mehrabian’s finding, then, 
is of considerable interest because it suggests 
that investigators may now be able to use 
the written channel of communication as well 
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as the spoken channel in further research on 
underlying attitudinal and motivational states. 

With Mehrabian’s help the present study is 
an exact replication (in Portland) of his 
former (Los Angeles) study. The reader will 
find the method adequately described in 
Mehrabian (1965). He used college student 
Ss, as did the present authors (although our 
Ss were slightly older, upper-division col- 
legians who were enrolled in one of two 
similar classes from a single instructor). In 
both Mehrabian’s and the present study, all 
the data were collected in a single classroom 
session. For half his NV the instructions were 
general (open-ended) and required each S, in 
counter-balanced order across two groups, to 
write two letters of positive recommendation: 
one about a person he “liked,” the other 
about a person he “disliked.” For the remain- 
ing half of his total VY, Mehrabian (and the 
present authors) gave the writer of the two 
letters the additional specific information that 
the person S was writing about: “ is ap- 
plying for a job in which you are to discuss 
his (her) character, intelligence, ability and 
perseverance at work [p. 520].” This present 
study used a total of 72 Ss, with Ws of 36 and 
36 in the general and specific instruction con- 
ditions, whereas Mehrabian had a total of 
69, with Ns of 38 and 31, respectively. 





RESULTS 


Table 1 presents the present authors’ find- 
ings and includes Mehrabian’s for comparison. 
It is clear that under either general (121.7 
vs. 90.9) or specific (128.9 vs. 102.4) in- 
structions, the present Ss also wrote more 
words per letter about a person they liked 
than they did about a person they disliked 
(~ < .001 in both instances). Word counts 
were made by two secretaries who were un- 
aware of the hypothesis. The Pearson r be- 
tween their independent word counts was .998. 

A number of additional analyses were con- 
ducted. Reading down Table 1, in neither 
the Like condition nor the Dislike condition 
did the difference between number of words 
written under general instructions differ sig- 
nificantly from the number written under 
specific instructions. Reading across but not 
shown in Table 1 (as one would expect) the 
number of words written by the 36 Ss (in the 
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TABLE 1 


MEAN NuMBER OF Worps WRITTEN IN A LETTER OF 
RECOMMENDATION BY SUBJECTS UNDER Two In- 
STRUCTION-INDUCED ATTITUDINAL STATES 














Assumed attitudinal set 
toward the person 
about whom Ss were 
asked to write a favor- 


Study ably toned letter 


Person | Person 





liked | disliked | ? 
Present study (N = 72) 
Writers given general instructions 
M (No. of words) 121.7 90.9 -001 
SD 35.5 29.4 
Writers given specific instructions 
M (No. of words) 128.9 102.4 | .001 
SD 38.9 32.2 
Dp ns 
Mehrabian study (N = 69) 
Writers given general instructions 
M (No. of words) 106.4 92.7 .05 
Writers given specific instructions 
M (No. of words) PS153' 108.6 01 
Dp .05 





general instruction group) under the Like 
condition correlated highly with the number 
of words written by these same 36 Ss in the 
Dislike condition (r of .60). For the specific 
instruction group the comparable 7 was .62. 
Thus how many words any single S wrote in 
a letter had a fairly high degree of stability 
across the two attitudinal conditions. This 
gives even additional strength to the main 
finding in Table 1 that, this fact notwith- 
standing, taking an S as his own control, the 
average S in this study wrote more words 
when he was asked to write a positive letter 
with an underlying positive attitude than 
when he was asked to write a positive letter 
with an underlying negative attitude. The 
former attitudinal set evoked some 25% 
greater productivity in the two conditions 
(128.9 vs. 102.4 words; and 121.7 vs. 90.9 
words; both p< .001). In still additional 
analyses we found (a) no significant effect 
due to the order or sequence in which the 
positive letters about the liked and disliked 
persons were written and (0) no correlation 
between an S’s vocabulary level as measured 
by the Shipley-Hartford test and the number 
of words written by him in any of the four 
conditions shown in the top half of Table 1. 
This latter finding strongly affirms that the 
presumed attitudinal effect shown in Table 1 
is, in fact, that—attitudinal—and is not an 
effect due to intellectual competence or other 
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related cognitive factors which might be dif- 
ferentially affected by one’s level of intel- 
ligence. It may be of interest to the reader 
that numerous attempts in our laboratory 
over the past decade to find differences in the 
noncontent speech and silence behavior (means 
per utterance) of Ss with very high, average, 
and below average WAIS IQs have failed to 
reveal any such differences. That is, an S’s 
mean length of speech per utterance and his 
characteristic (mean) reaction time before 
speaking are not related to his WAIS IQ, 
including Verbal or Performance IQ. Thus, 
this similar finding with the Shipley-Hartford 
test in the present study adds still further 
support to the basic (motivational) saliency 
hypothesis currently being explored by the 
present authors, Mehrabian, and a host of 
other investigators. 


DISCUSSION 


The written and spoken channels of com- 
munication have a complex and far from con- 
sistently positive relationship with each other, 
for example as reported by Drieman (1962a, 
1962b) from Holland in 8 Ss whose written 
and spoken productions were exhaustively 
analyzed. Nevertheless, the purpose of the 
present replication and modest extension of 
Mehrabian’s study was for it to serve as a 
beginning bridge and test, in a second and 
not necessarily related channel of communica- 
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tion, of the fairly consistent finding with the 
spoken channel alone: namely, that one’s un- 
derlying attitudinal or motivational state can 
be reflected by changes or differences in this 
(speech) communicative channel. The re- 
sults in Table 1 suggest that the written 
channel of communication may be as effective 
as the spoken in mirroring underlying mood 
and other motivational states. 
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Biographical inventory keys were developed and cross-validated in a total 
sample of 400 public high school girls, subdivided into creative and matched 
control groups in art and writing. Creative Ss were selected principally through 
teachers’ nominations supported by creative products. Cross-validation yielded 
criterion correlations of 34 and .55 for art and writing keys (each p< .001). 
Final keys, comprising items that differentiated in both initial and cross- 
validation samples (compound p< .05), were used to describe the biographical 
correlates of creativity. Comparisons were also made with the results of a simi- 


lar, earlier study of creative boys. 


Research on the nature and correlates of 
creativity has been accumulating at an in- 
creasing rate. Golann (1963) classified the 
various approaches with reference to their 
emphasis on products, process, measurement, 
or personality. Methodologically, investiga- 
tions differ in their use of evaluated achieve- 
ment (which focuses on products) or test per- 
formance as criteria of creativity. The test 
criterion is open to criticism because of limita- 
tions of test coverage and inadequate or in- 
consistent validation data. For these reasons, 
the criterion employed in this study was 
evaluated achievement. More specifically, the 
present criterion reflected the essential con- 
ditions of creativeness proposed by Mac- 
Kinnon (1962), which include (a) novelty, 
originality, or statistical infrequency; (0) 
adaptiveness to reality, involving the achieve- 
ment of some reality-oriented goal, such as 
the solution of a scientific or aesthetic prob- 
lem; and (c) sustained activity leading to the 
development, evaluation, and elaboration of 
the original idea. It is apparent that tests con- 
centrate on the first of these conditions, 
largely neglecting the last two. 

In the effort to identify the correlates of 
creativity, different investigators have em- 
ployed aptitude and personality tests, inter- 
views, and biographical inventories. The bio- 
graphical inventory provides a standardized 

1 This study is part of a larger project supported 
by Subcontract No. 2 of the Center for Urban Edu- 
cation, Contract OEC-1-7-062868-3060 with the 
United States Office of Education. 

2 Requests for reprints should be sent to Anne 


Anastasi, Department of Psychology, Fordham Uni- 
versity, Bronx, New York 10458. 


group procedure for gathering information 
about the individual’s experiential history and 
about relevant aspects of the psychological 
environment in which he developed. Insofar 
as environment may play a significant role in 
the development of creativity, the biographi- 
cal inventory technique should serve a dual 
function: (a) prediction of subsequent crea- 
tive achievement in individuals, (d) identifica- 
tion of environmental variables conducive to 
the development of creative behavior. 

As predictive instruments, biographical in- 
ventories have repeatedly demonstrated satis- 
factory validity against complex industrial, 
military, and educational criteria (Freeberg, 
1967; Henry, 1966). With regard to creative 
achievement, they have proved effective in 
differentiating between levels of creativity in 
several groups of scientific research workers. 
Such results have been obtained with petro- 
leum research scientists (Morrison, Owens, 
Glennon, & Albright, 1962; Smith, Albright, 
Glennon, & Owens, 1961), with a variety of 
research personnel in a pharmaceutical com- 
pany (Buel, 1965; Tucker, Cline, & Schmitt, 
1967), with engineers (McDermid, 1965), 
and with psychologists and chemists (Cham- 
bers, 1964). In a series of studies of scien- 
tists in the National Aeronautics and Space 
Administration (NASA), Taylor, Ellison, and 
Tucker (1966) obtained validity coefficients 
in the .40s and .50s when biographical in- 
ventory keys were cross-validated against 
several criteria of creative achievement. It is 
also noteworthy that such biographical in- 
ventory keys have shown substantial validity 
generalization when applied to research scien- 
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tists in other fields (Buel, Albright, & Glen- 
non, 1966; Cline, Tucker, & Anderson, 1966). 

Investigations at the high school and col- 
lege level have usually employed tests as 
criteria, predictors, or both. Few studies have 
utilized biographical inventories and _ still 
fewer have done so against a criterion of 
evaluated achievement. Taylor, Cooley, and 
Nielson (1963) applied a modified version of 
the biographical inventory developed on 
NASA scientists to high school students par- 
ticipating in a summer science program sup- 
ported by the National Science Foundation. 
This biographical inventory proved to be the 
best overall predictor of creative research per- 
formance in these students, its validity being 
as high as .47 in one of the groups. Parloff 
and Datta (1965) compared contrasted groups 
of participants in the Westinghouse Science 
Talent Search, selected on the basis of judges’ 
ratings of their research projects. However, 
these groups were compared chiefly in per- 
sonality test scores, the only background items 
reported being father’s occupation, socio- 
economic level, and intactness of family. 
Dauw (1966) successfully differentiated be- 
tween highly creative and less creative adoles- 
cents by means of a biographical inventory, 
but his Ss were chosen on the basis of crea- 
tivity tests only. 

A series of studies conducted for the Na- 
tional Merit Scholarship Corporation report 
significant relationships between biographical 
data and subsequent creative achievement in 
college (Holland & Nichols, 1964; Nichols & 
Holland, 1963). That the obtained relation- 
ships are often low may result in part from 
the highly selected nature of the samples. It 
is of particular interest that among the many 
predictors investigated—including aptitude 
and personality tests—the best predictor of 
creative achievement in college was creative 
achievement in the same area in high school 
(Holland & Astin, 1962; Nichols & Holland, 
1964). Even more striking is the finding that, 
in a large and representative sample of col- 
lege freshmen, it was the students with supe- 
rior high school grades who had most often 
won distinction for creative achievement in 
high school extracurricular activities (Werts, 
1966). Contrary to a prevalent view, academic 
aptitude was closely related to creativity, 
especially in scientific and literary fields. 
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Relevant biographical data have also been 
obtained in studies employing interviewing or 
other intensive individual assessment pro- 
cedures with adults who have made creative 
contributions in the arts or sciences (Mac- 
Kinnon, 1962; Roe, 1951a, 1951b, 1953). 
Similar assessment techniques were utilized 
by Helson (1967) with college women identi- 
fied through faculty nominations and ratings 
of creative achievement in college. In the same 
series of studies, Helson (1965, 1966, 1967) 
gathered questionnaire data regarding child- 
hood interests and activities as recalled by her 
Ss. Finally, parental characteristics have been 
investigated in relation to children’s creativity 
as determined by either creativity tests or 
evaluated achievement (Domino, 1969; 
Dreyer & Wells, 1966; Helson, 1966, 1967; 
Weisberg & Springer, 1961). The Ss of these 
studies included school children, high school 
students, and college women. 

In an earlier study by the present writers 
(Schaefer & Anastasi, 1968), biographical in- 
ventory keys were developed in a group of 
400 high school boys against criteria of crea- 
tive achievement in (a) science and (0) art 
or creative writing. Cross-validation yielded 
validity coefficients of .35 and .64 for the 
science and art-writing keys, respectively, both 
significant at the .001 level. In the present 
study, the same basic procedures were fol- 
lowed in developing biographical inventory 
keys for high school girls in creative art and 
creative writing. These two fields were chosen 
for further exploration because in the earlier 
study differentiation between creative and 
control groups was greater in the combined 
art and writing group than in the science 
group. Among high school girls, moreover, 
outstanding creative achievement in art or 
writing is more frequent than it is in science. 
As in the earlier study, a second major ob- 
jective of the present investigation was to 
utilize the differentiating biographical inven- 
tory items in formulating a description of the 
antecedents and correlates of creativity in 
this population. 


METHOD 
Subjects 


The Ss employed in the principal data analyses 
were 400 female students from seven public high 
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schools in greater New York.? These schools were 
chosen, first, because they offer courses or programs 
providing opportunities for creative activities and, 
second, because they have outstanding records of 
awards, prizes, and other indications of creative 
student achievement in art or writing. Of the 400 
Ss, 246 were seniors, 128 juniors, and 26 sophomores. 
The group as a whole was superior with regard to 
educational level of parents, slightly more than one- 
half of the fathers and one-third of the mothers 
having attended college for one or more years. 
While over half of the parents were born in New 
York City, nearly one-third were foreign-born. The 
most frequent national ancestries were Russian, Pol- 
ish, and German, in that order; 24 Ss were Negro. 

The total sample comprises four criterion groups 
of 100 students each, designated as follows: Crea- 
tive-Art (CrA), Control-Art (CoA), Creative-Writ- 
ing (CrW), and Control-Writing (CoW). For inclu- 
sion in a creative group, S had to meet two criteria: 
(a) teacher nomination on the basis of one or more 
creative products to be listed on a teacher nomina- 
tion form—any type of visual art or creative writing 
was acceptable for this purpose; (b) score above a 
minimum cutoff on Guilford Alternate Uses and 
Consequences tests. The control Ss were enrolled in 
the same courses from which the creative Ss were 
selected and were nominated by the same teachers 
as having provided no evidence of creative achieve- 
ment. They also scored below a maximum cutoff on 
the two Guilford screening tests. Within each field, 
creative and control groups were matched in school 
attended, class, and grade-point average. The 400 
Ss in the four criterion groups were selected from an 
initial pool of 1,114 nominees in the seven schools. 

It should be noted that the Guilford tests were 
employed only as a check on irrelevant factors that 
might have influenced the nomination of creative or 
control Ss. The scores on these tests were employed 
only to exclude cases, never to admit them. More- 
over, the two cutoff scores were sufficiently extreme 
as to exclude only those students whose test per- 
formance was highly discordant with their reported 
achievement. In terms of available published norms, 
the mean scores of the creative students on the two 
Guilford tests are approximately equal to those of 
college students, while the mean scores of the con- 
trol groups fall close to the ninth grade mean. 


Biographical Inventory 


Except for minor changes, the biographical in- 
ventory employed in this study was the same as 
that prepared in the earlier study of high school 
boys (Schaefer & Anastasi, 1968). The questions were 
originally formulated on the basis of hypotheses and 
published research findings regarding the correlates 


8 The authors gratefully acknowledge the coopera- 
tion of J. Wayne Wrightstone, Assistant Super- 
intendent, Board of Education of the City of New 
York, Nathan Brown, then with the Center for 
Urban Education, and the principals and participating 
teachers of the following high schools: Abraham Lin- 
coln, Art and Design, Erasmus Hall, Forest Hills, 
Jamaica, Midwood, and Music and Art. 
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of creativity. The 166 questions of this inventory are 
grouped into five sections designated as_ physical 
characteristics, family history, educational history, 
leisure-time activities, and miscellaneous. Most of the 
questions cover objective facts regarding present or 
past activities and experiences; some call for expres- 
sions of preference and others pertain to plans and 
goals. 

The inventory contains some multiple-choice and 
checklist items; but many questions are open-ended. 
Even with the objective items, moreover, there is 
usually provision for additional unlisted responses. 
Although scoring and data analysis are more difficult 
under these conditions, these types of items yield a 
richer return of information and are especially ap- 
propriate in an exploratory study. All responses 
were coded prior to tabulation. For each question, 
there were several possible responses, the number 
being quite large for some questions. In addition, 
several questions yielded responses that could be 
classified from different viewpoints to test different 
hypotheses. For example, a response to “List your 
present hobbies” could be scored with reference to 
number of hobbies or type of hobbies; and hobbies 
could be sorted into types according to several dif- 
ferent schemas. As a result, the 166 questions yielded 
a total of 3,962 “scorable items” or individual re- 
sponse alternatives employed in the item analysis. 


Procedure 


The biographical inventory, together with three 
tests employed in another part of the project, was 
administered by the same E to groups of 110-256 
students during a 2-hr. session held in the school 
buildings outside of school hours. The Ss were paid 
for participating in this testing session. Identifica- 
tion numbers were assigned to provide anonymity, 
and students were assured of the confidentiality of 
their responses. 

In the analysis of biographical inventory data, 
each of the four criterion groups was subdivided into 
two subgroups of 50, employed for development of 
scoring keys and cross-validation, respectively. Each 
pair of subgroups was equated in number of stu- 
dents from each school, class distribution, grade-point 
average, and mean score on the screening tests. For 
each of the 3,962 scorable items, classified as present 
or absent, a phi coefficient was computed against the 
dichotomous criterion of creative versus control. 
These coefficients were computed separately in art 
and writing criterion groups. All items with phi 
coefficients at the significance level of  < .20 or bet- 
ter were considered for inclusion in the initial CrA 
and CrW scoring keys. Some of these items were 
excluded because they duplicated other items, were 
checked by fewer than four Ss in either sub- 
group, or were inconsistent with other responses or 
with hypotheses and hence likely to have yielded 
isolated chance correlations. 

In the initial scoring keys, a weight of 1 was as- 
signed to items discriminating between the p< .20 
and p< .05 levels, and a weight of 2 to items dis- 
criminating at the » < .05 level or better. Items with 
higher frequencies in the creative group received 
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TABLE 1 


BIOGRAPHICAL INVENTORY SCORES OF CRITERION 
GROUPS IN CROSS-VALIDATION SAMPLES 





Creative art key Creative writing key 




















Score aa? See 4) ~ ee ti eee > ehtae 
Creative | Control | Creative | Control 
art art writing | writing 
141-150 0 0 1 0 
131-140 0 0) 6 1 
121-130 0 0 6 1 
111-120 0 0 5 Z 
101-110 0 0 11 4 
91-100 1 0 10 1 
81-90 7 1 4 10 
71-80 3 5 3 8 
61-70 12 7 3 11 
51-60 19 11 1 8 
41-50 4 13 (0) 3 
31-40 ] 8 0 1 
21-30 3 5 0 0 
N 50 50 50 50 
M 61.26 50.00 104.40 760.24 
o 15.65 15.04 Deo 20.87 
Range 24-94. 23-89 54-149 | 39-136 
z 3.67" 6.63* 
Act 34" 55* 
Note,—-In order to eliminate negative scores, 50 was added 
to each raw score. This adjustment, however, does not exclude 
negative scores from the total possible range, which is —87 to 


248 for the CrA key and —60 to 370 for the CrW key, 
*p <.001,. 


positive weights; those with higher frequencies in 
the control group received negative weights. The 
initial CrA and CrW scoring keys were used in 
scoring the biographical inventories of the cor- 
responding creative and control Ss in the cross- 
validation samples. The scorers were unaware of the 
criterion status of Ss. The scores thus obtained were 
correlated with the dichotomous criterion to provide 
an estimate of the validity of the scoring keys. 

In order to utilize all the data in the selection of 
items for final scoring keys, item analyses were car- 
ried out independently in initial and cross-validation 
samples and those items were selected that differ- 
entiated between creative and control groups with a 
compound probability of .05 or better (Baker, 1952). 


RESULTS 


Application of the initial CrA and CrW bio- 
graphical inventory keys to the appropriate 
cross-validation samples yielded the data sum- 
marized in Table 1. Although there is con- 
siderable overlapping between the scores of 
creative and control groups, the means of 
both creative groups are significantly higher 
than those of the corresponding control groups 
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at the .001 level. Point-biserial correlations 
between biographical inventory scores and the 
dichotomous criterion are .34 in the art group 
and .55 in the writing group. 

At least two conditions imposed upon the 
selection of Ss tend to reduce the differences 
between creative and control groups. First, 
creative and control Ss were equated in grade- 
point average, although there is evidence that 
high school grades are in fact related to crea- 
tive achievement (e.g., Werts, 1966). Second, 
the creative and control Ss were enrolled in 
the same courses in art or writing and at- 
tended high schools noted for the creative 
achievement of their students. 

The second condition applies more strongly 
to the art than to the writing group, since 
a large proportion of Ss in the art sample 
were in special high schools whose students 
are selected on the basis of superior artistic 
talents. This fact is consistent with the finding 
that differentiation between creatives and con- 
trols was less sharp in the art than in the 
writing group. Not only were the mean differ- 
ence and the criterion correlation higher in 
the writing than in the art group, but the 
number of significantly differentiating items 
was also larger in the CrW key than in the 
CrA key—a difference that is reflected in the 
higher scores obtained with this key. In the 
light of these sample characteristics, it should 
be noted that the present study is concerned 
with the differentiating biographical charac- 
teristics of the more highly creative Ss within 
an academically superior and talented popula- 
tion. 

After the cross-validation of the initial bio- 
graphical inventory keys, final keys were con- 
structed with items whose compound prob- 
ability was derived from both initial and 
cross-validation samples. The CrA key thus 
developed contains 40 items, the CrW key 82 
items. An examination of these items pro- 
vides a description of the biographical cor- 
relates of creativity as revealed within the 
conditions of this study. 


DIscussIOoN 
Correlates of Creativity across Both Fields 


The most conspicuous characteristic of the 
creatives in both fields is a pervasive and con- 
tinuing interest in their chosen field and ab- 
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sorption in its pursuit. Items in this category 
include those dealing with favorite subjects 
in elementary school and high school; sub- 
jects found easy and those found difficult; 
nature of extracurricular activities in elemen- 
tary school and high school, as well as an- 
ticipated extracurricular activities in college; 
concentration of hobbies in one’s field of in- 
terest, as well as hobbies bearing a close rela- 
tion to vocational goal; and reported career 
plans. Strength of interest is also indicated 
by the significantly greater number of crea- 
tives than controls in both fields reporting 
that they frequently became so absorbed in 
a project that they missed a meal or stayed 
up late. 

Typically, the highly creative adolescent 
girl in this study had manifested an absorbing 
interest in her field since childhood and her 
creative activities had received recognition 
through exhibitions, publication, prizes, or 
awards. Her initial interest was thus re- 
warded and reinforced early in life by persons 
in authority, such as parents and elementary 
school teachers. The continuity of creative 
achievement over time is corroborated by the 
findings of other investigations, notably Hel- 
son’s (1965, 1967) research with college 
women, the surveys of National Merit Schol- 
arship finalists (Holland & Astin, 1962; 
Nichols & Holland, 1964), and our own 
earlier study of creative high school boys 
(Schaefer & Anastasi, 1968). 

Several significantly differentiating items 
suggest a predominance of unusual experiences 
in the backgrounds of the creatives as con- 
trasted with the controls. Thus the creatives 
were more likely than the controls to have 
had a variety of unusual experiences, to day- 
dream about unusual things, to have col- 
lections of an unusual nature (such as ant 
pictures, mushrooms, and mobiles), and to 
have experienced eidetic imagery or had 
imaginary companions in childhood. To some 
extent, these differences may indicate greater 
readiness to acknowledge unusual experiences 
on the part of the creatives and less reluctance 
to report them. It is also interesting to note 
that more creatives than controls in both fields 
reported unusual types of paternal discipline, 
other than those listed on the inventory form. 
One could speculate that the prevalence of 
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atypical experiences in their early life may 
contribute to the low level of conformity and 
conventionality generally found to character- 
ize creative persons at all ages. 

Because of the selection procedures em- 
ployed, both creative and control groups 
tended to come from intellectually superior 
homes. Nevertheless, certain significant dif- 
ferences were found in the familial back- 
grounds of creatives and controls. In both 
creative groups, significantly more fathers had 
attended college, graduate school, or profes- 
sional schools than was true in the correspond- 
ing control groups. More controls than crea- 
tives reported that no musical instrument was 
played in the family. Since Ss were not se- 
lected for this study on the basis of musical 
achievement, this difference probably reflects 
the general cultural level of the home. Also 
relevant to general home conditions may be 
the fact that significantly more creatives than 
controls reported having two or more collec- 
tions. 

Earlier investigations have repeatedly found 
creativity to be related to parental educational 
and occupational level and to socioeconomic 
level of the home, whether Ss be distinguished 
scientists (Chambers, 1964) or creative high 
school students (Schaefer & Anastasi, 1968). 
Nor is the relationship limited to full-fledged 
creative achievement. Using performance on 
the Minnesota Tests of Creative Thinking as 
a criterion, Dauw (1966) found that high- 
scoring high school seniors had parents with 
better educational backgrounds and more pro- 
fessional and managerial occupations than did 
the low scorers. Similarly, in a study of 
seventh grade children subdivided on the basis 
of scores on an originality battery, socio- 
economic status yielded the largest group dif- 
ference of all variables investigated (Ander- 
son & Cropley, 1966). In explaining this 
finding, the authors refer first to typical 
lower-class parental attitudes that tend to 
evoke anxiety toward school learning and 
hence encourage convergent rather than 
divergent thinking. As a second reason, they 
cite the more varied and stimulating environ- 
ment provided by homes at higher socio- 
economic levels. In Parloff and Datta’s (1965) 
study of highly selected participants in the 
Westinghouse Science Talent Search, the 
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entire sample excelled above the general popu- 
lation in socioeconomic level and in parental 
educational and occupational level, although 
these variables were unrelated to the rated 
creativity of projects within the sample. 

With regard to parental influence on the 
creative high school girls in our study, the 
majority of items differentiating between 
creatives and controls refer to the father 
rather than to the mother. In our earlier 
study of high school boys, the reverse was 
true, more of the differentiating items pertain- 
ing to the mother. These findings are con- 
sistent with those reported by Dauw (1966) 
for high school seniors, by MacKinnon (1962) 
for creative male architects, and by Helson 
(1966, 1967) for creative women mathe- 
maticians and creative college women. In the 
study of women mathematicians, moreover, 
Helson (1966, p. 21) reports that “the 
creative women were judged by interviewers 
to have had more identification with their 
fathers than comparison subjects.” If such 
results truly indicate a greater influence of 
the opposite-sex parent on creative children, 
they may help to explain the finding that in 
their attitudes, interests, and problem-solving 
styles creative individuals show more traits 
of the opposite sex than do controls and gen- 
erally conform less closely to sex stereotypes 
(see e.g., MacKinnon, 1962). 


Differences between Creativity Correlates in 
Art and Writing Groups 


The CrA and CrW groups are not directly 
comparable because of differences in school 
and class distribution and grade-point average. 
As might be anticipated, the grades in the 
CrW group average significantly higher than 
those in the CrA group. In the present experi- 
mental design, each creative group was 
equated with its own control group in these 
variables. The question now to be considered 
is whether the characteristics that significantly 
differentiate CrA Ss from their own controls 
differ in any systematic way from those that 
significantly differentiate the CrW Ss from 
their controls. This question can be answered 
by examining the items in the final CrA and 
CrW keys. 

As previously noted, the CrW key contains 
about twice as many items as the CrA key. 
With few exceptions, these additional items 
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fall into a cluster indicative of strong intel- 
lectual and “cultural” orientation and breadth 
of interests, both in the student herself and in 
her home background. The fathers of the CrW 
girls, as compared with those of the controls, 
are more likely to have one or more hobbies, 
frequently of an artistic or literary nature. 
Magazines regularly available at home are 
more likely to be of the cultural—intellectual 
types. The student herself is more likely to 
own classical records, attend concerts, and 
read more than 10 books a year, preferably in 
science, science fiction, philosophy, languages, 
or history. She regularly reads more than 
two sections of a newspaper, including edi- 
torials. She frequently visits art museums and 
galleries, has received lessons in arts or crafts, 
and has a large number of hobbies, beginning 
in childhood, to which she now devotes over 
5 hr. a week. She reports owning a microscope 
more often than do the controls. In high 
school, she participates more extensively in 
extracurricular activities and anticipates more 
participation in college. Her college plans are 
more fully developed and ambitious. In com- 
parison to the controls, the CrW student is 
more often considering two or more colleges, 
usually including an Ivy League or small 
private college, and is less often considering 
a public city college. 

It is noteworthy that the breadth of inter- 
ests and intellectual orientation characterizing 
the CrW girls was found in both creative 
groups of boys in our earlier study (Schaefer 
& Anastasi, 1968). One of these groups was 
selected because of creative achievement in 
science, the other because of creative achieve- 
ment in art or writing. The latter group, how- 
ever, included 76 boys in creative writing 
and only 24 in art. It is thus likely that the 
similarity of this. group to the CrW girls 
resulted from the predominance of creative 
writing cases within it. 

When the results of the two studies are 
considered together, they indicate that the 
biographical correlates of creativity are closely 
similar for boys and girls, with the possible 
exception of the reversal of role model and 
the greater influence of the opposite-sex parent 
upon the creative offspring. With regard to 
field of creative achievement, certain char- 
acteristic differences emerge among science, 
writing, and art. Cutting across both sex and 
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field, however, are certain common char- 
acteristics of creative adolescents: continuity 
and pervasiveness of interest in chosen field; 
prevalence of unusual, novel, and diverse 
experiences; and educational superiority of 
familial background. 
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This study compares the perceptions of need satisfactions of (a) American 
managers currently working in overseas locations as a function of their posi- 
tion in the organizational hierarchy and (b) the overseas American managers 
and their domestic counterparts. A need satisfaction questionnaire (Porter, 
1961), previously employed in numerous studies to test the Maslow need- 
hierarchy concept, was used to collect the data from 127 overseas Americans. 
The domestic manager data used for comparisons were collected in a study by 
Porter (1963b). The findings indicate that no matter how Ss were classified, 
job level or domestic and overseas, the Autonomy and Self-actualization need 


categories are the least fulfilled. 


Studies have been conducted recently using 
industrial managers, government managers 
(Edel, 1967; Paine, Carroll, & Leete, 1966) 
and union officers (Miller, 1964) as Ss in 
determining need satisfactions. Each of the 
studies used Ss who were working within the 
United States or who were nationals of foreign 
nations working in their native country. For 
example, a series of studies reported by 
Porter (1963a) found that line managers 
derive more need satisfaction than staff man- 
agers from their job, that the higher a man- 
ager is in the organizational hierarchy the 
more need satisfaction attained (Porter, 
1962), and that high-level managers in large 
organizations have more need satisfaction 
than high-level managers in small firms but 
the reverse is true for lower level managers 
(Porter, 1963b). 

The Porter (1961) need satisfaction ques- 
tionnaire was employed to study 3,641 man- 
agers from around the world (Haire, Ghiselli, 
& Porter, 1966). These managers were from 
14 different countries. This investigation was 
concerned with manager attitudes and how 
they were similar or different among coun- 
tries. One of the general findings of this job 
attitude study was that over all countries in- 
cluded, two needs stand out as largely un- 


1The author wishes to acknowledge the coopera- 
tion of Lyman W. Porter in providing statistical 
data so that comparisons could be made. 

2 Requests for reprints should be sent to the 
author, Department of Administrative Science and 
Quantitative Methods, Commerce Building, Uni- 
versity of Kentucky, Lexington, Kentucky 40506. 


satisfied—Autonomy and_ Self-actualization. 
The deficit in satisfaction of these two needs 
combined is more than twice as large as the 
dissatisfaction in the three other need areas, 
Security, Social, and Esteem. 

Each of the managers studied by Haire, 
Ghiselli, and Porter (1966) were working 
within their own country. For example, the 
American managers included worked within 
the United States and the Japanese re- 
spondents were working in Japan. There are 
presently no empirical studies available in 
the current literature which compare domestic 
and overseas American managers at similar | 
organizational levels on the need satisfaction 
derived from their job. 

In an attempt to fill in some of the gaps in 
previous research involving managerial moti- 
vation, the present study makes use of Amer- 
ican managers currently working overseas and 
study results reported by Porter (1963b) 
involving domestic managers. Attempts to de- 
termine if the American manager’s level in 
the overseas managerial hierarchy influences 
his perceptions of need satisfaction are lack- 
ing in the literature. In addition, it has not 
yet been demonstrated empirically that there 
are differences or similarities in the need satis- 
faction opportunities of managers working 
within or outside of the United States. 

If those responsible are to do the best pos- 
sible job recruiting and selecting American 
managers for overseas assignments, it would 
be desirable to know what features of the 
overseas job are most satisfying and which 
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are least satisfying. For example, if it is 
found that the overseas job offers optimum 
opportunities for satisfaction of the Esteem 
needs of the manager when compared with a 
domestic position, it may be advantageous to 
secure overseas managerial candidates that 
value Esteem need satisfaction above any 
other. 


METHOD 
Sample 


The study is divided into two major phases. 
First, data were obtained by administering Porter’s 
(1961) need satisfaction questionnaire to top and 
middle managers currently working abroad _ rep- 
resenting large United States business corporations. 
The top management group includes presidents 
and vice-presidents and the middle management 
group consists of division, plant, and major depart- 
mental managers. Usable replies were received from 
78 top managers and 49 middle managers. 

The managers studied were randomly selected 
from name lists of American managers currently 
overseas working for the largest United States in- 
dustrial corporations. These lists were compiled by 
using Fortune’s “500,” The Directory of Firms 
Operating in Foreign Countries (Angel, 1966), and 
names submitted by a number of consulting agencies. 
The United States executives included in this study 
have permanent assignments outside the geographical 
boundaries of the United States. Each of the man- 
agers have only minimal contact with their home 
offices. They interact primarily with the nationals 
of the host nation or with third country nationals. 

The domestic manager need satisfaction data for 
the second phase which was needed to make com- 
parisons between domestic and overseas managers 
were obtained from a study reported by Porter 
(1963b). He administered his satisfaction question- 
naire to several thousand individuals in management 
positions in firms located throughout the continental 
United States. These individuals represented a ran- 
dom 10% sample of the American Management As- 
sociation and a random sampling of a nonmember 
‘mailing list of the Association. The questionnaires 
were mailed to top-, middle-, and lower level man- 
agers in large-, medium-, and small-sized companies. 
‘Only the top and middle-level managers in large 
organizations of the Porter (1963b) study are used 
to make comparisons with the present investigation. 


‘Questionnaire 


The amount of need satisfaction experienced by 
management respondents in the Porter (1963b) 
study and the present study for each of 12 need 
items was determined by subtracting the responses 
to Part a of the item (How much is there now?) 
from the responses to Part b of the item (How 
much should there be?). 

- Individual scores were then averaged so that com- 
parisons of the two groups on each need item could 
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be made. A two-tailed ¢ test was used to test 
for statistically significant differences in perceived 
need satisfaction of the top and middle manager 
classification and the domestic and overseas manager 
breakdown. 


Categories of Needs and Specific Item De- 
scriptions 


Listed below are the specific need items and 
major need categories used in both studies. The items 
were listed in a random fashion in the question- 
naires, but are listed here in systematic order which 
corresponds to the Maslow-type (1943) need frame- 
work. 


1. Security need 

a. the feeling of security in my management 

position 
2. Social needs 

a. the opportunity, in my management position, 
to give help to other people 

b. the opportunity to develop close friendships 
in my management position 

3. Esteem needs 

a. the feeling of self-esteem a person gets from 
being in my management position 

b. the prestige of my management position in- 
side the company (that is, the regard re- 
ceived from others in the company) 

c. the prestige of my management position out- 
side the company (that is, the regard received 
from others not in the company) 

4. Autonomy needs 

a. the opportunity for independent thought and 
action in my management position 

b. the authority connected with my manage- 
ment position 

c. the opportunity, in my management position, 
for participation in the settings of goals 

5. Self-actualization needs 

a. the opportunity for personal growth and 
development in my management position 

b. the feeling of self-fulfillment a person gets 
from being in my management position (that 
is, the feeling of being able to use one’s own 
unique capabilities, realizing one’s poten- 
tialities) 

c. the feeling of worthwhile accomplishment in 
my management position 


RESULTS 


Table 1 compares the need satisfaction 
and need category cluster scores of the top 
and middle manager respondents currently 
overseas. The individual item need satisfac- 
tion scores (e.g., 12 item scores) indicate 
that top managers are significantly more 
satisfied than middle managers with respect 
to their need for prestige inside the company 
and opportunity to participate in goal setting. 
The top managers report more satisfaction 
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TABLE 1 


AVERAGE NEED SATISFACTION AND CLUSTER SCORES OF OVERSEAS AMERICAN MANAGERS STUDIED: 
Top versus MippLe MANAGERS 








Mean scores— | Mean scores— 








Need categories and items: top managers | middle managers t ratio 
(N = 78) (NV = 49) 
1. Security need 
a. (security in job) 413 653 1,669 
2. Social needs 
a. (opportunity to help people) 364 428 OLR 
b. (opportunity for friendships) 310 .489 1.230 
3. Esteem needs 
a. (feeling self-esteem) .240 408 1.519 
b. (prestige inside company) 256 ow 2.410* 
c. (prestige outside company) 243 367 1.176 
4. Autonomy needs 
a. (opportunity for independent thought and action) 472 653 1.748 
b. (authority in position) .662 .979 1.152 
c. (opportunity to participate in goal setting) 405 ie 2.279" 
5. Self-actualization needs 
a. (opportunity for personal growth and development) .780 653 — .168 
b. (feeling self-fulfillment) 893 1,102 1.024 
c. (feeling of worthwhile accomplishment) 770 1.040 1.737 
Cluster scores by rank 
Esteem 247 442 
Social 338 459 
Security 413 653 
Autonomy 514 803 
Self-actualization 781 932 


Note.—The larger the numerical values the less perceived need satisfaction. 


* » < .05 as determined by two-tailed ¢ test. 


than middle managers in 11 of the 12 need 
items. 

The cluster scores for each category are 
developed so that comparisons of the overall 
scores can be readily made. The average mean 
cluster scores are presented in rank order 
format in Table 1. Analysis of Table 1 illus- 
trates that the Esteem and Social needs are 
the most satisfied. Further review of Table 1 
indicates that the largest need deficiencies 
for both groups are reported in the Self- 
actualization category. This finding generally 
agrees with the results (Miller, 1964; Paine, 
Carroll, & Leete, 1966; Porter, 1963b). 

The data presented in Table 2 compares 
the need satisfaction and need cluster scores 
of domestic (Porter, 1963b) and overseas 
American managers. Examination of the top 
manager need item scores show that the over- 
seas respondents report more satisfaction in 
8 of the 12 item scores. 


The middle manager scores in Table 2 in- 
dicate that the overseas manager perceives 
more satisfaction than his domestic counter- 
part on 6 of the 12 items. The largest differ- 
ence in average need satisfaction occurs in 
comparing the la (security in job) scores. 

The need cluster scores presented in Table 2 
show that overseas top managers report more 
satisfaction than domestic top managers in 
all categories but the Social, while overseas 
middle managers are more satisfied than their 
domestic counterparts in three of the five cate- 
gories. Both the domestic and overseas top 
and middle management groups indicate that 
the Autonomy and Self-actualization need 
categories are the least satisfied. 


Discussion 


The results presented in Table 1 indicate 
that for the overseas managers included in 
this study there exists a relationship between 
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TABLE 2 
AvERAGE NEED SATISFACTION AND CLusTER Scores or Top aND Mipptz MANAGERS 
OversEAS versus Top AND Mippite MANAGERS IN THE UNITED STATES 
Top managers Middle managers 
Need categories and items Porter— Porter— 
domestic Overseas tratio | domestic Overseas t ratio 
N = 136| NV = 78 N = 268| N = 49 
1, Security needs 
a. (security in job) 36 41 —.37 .28 65 —2.39* 
2. Social Needs 
a. (opportunity to help people) 47 36 98 .39 43 — 34 
b. (opportunity for friendships) 06 ro —2.31* 25 49 —1.51 
3. Esteem needs 
a. (feeling of self-esteem) 63 24 5.512 Afi Al 2.47* 
b. (prestige inside company) 39 .26 1.23 56 55 08 
c. (prestige outside company) .23 24 —.10 34 37 —.28 
4. Autonomy needs 
a. (opportunity for independent thought and 
action) 49 66 —1,32 62 .98 —2.21* 
b. (authority in position) 56 47 78 92 .65 1.63 
c. (opportunity to participate in goal setting) 69 41 2.30% 97 18 12, 
5. Self-actualization needs 
a. (opportunity for growth and development) 80 68 85 88 .65 1.59 
b. (feeling of self-fulfillment) 83 89 — .38 1.00 1.10 —.52 
c. (feeling of accomplishment) 85 na .61 1.15 1.04 rip 
Domestic Overseas Domestic Overseas 
Security 36 43 28 65 
Social PH 34 32 46 
Esteem 42 20) 54 44 
Autonomy 58 51 84 .80 
Self-actualization 83 .78 1.01 .93 


Note.—The larger the numerical values the less perceived need satisfaction. 


*p» <.05 as determined by two-tailed ¢ test. 


the overseas manager’s vertical level in the 
organization hierarchy and the opportunity to 
satisfy only two specific need items. The top 
managers reported significantly more prestige 
and goal setting opportunities within the 
company than the middle managers. In most 
need item scores only relatively small differ- 
ences were found between the two manage- 
ment levels. This finding is contrary to some 
of the research findings of studies that in- 
vestigate the relationship of perceived need 


satisfactions and organizational level of 
domestic managers (Edel, 1967; Porter, 
1962). 


The present study’s results when compared 
with the findings of Porter (1963b) show 
some differences in opportunities to satisfy 
specific needs for domestic and overseas 


managers. The domestic managers (Porter, 
1963b), at both levels of the managerial 
hierarchy, reported that the need for security 
was highly fulfilled. It is also found that the » 
social needs of domestic managers are gen- 
erally more satisfied than their overseas 
counterparts. The overseas managers, how- 
ever, perceived more esteem need satisfaction 
than domestic managers. 

Because the overseas American managers 
are working in a foreign location away from 
critical scrutiny of the home office, it may be 
postulated that they have more autonomy 
than managers working in the United States. 
If more self-reliance exists in overseas assign- 
ments, then the American executive abroad 
should report at least as much satisfaction 
of the Autonomy need items as do the 
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domestic managers. Table 2 illustrates that 
this is not the case with respect to every need 
item studied. 

The Autonomy and Self-actualization needs 
appear to be the most critical area of need 
fulfillment deficiency at all levels of manage- 
ment for both domestic and overseas Amer- 
ican executives. This finding agrees with the 
Haire, Ghiselli, Mason, and Porter (1966) 
international study results which showed that 
managers throughout the world have not 
been able to satisfy the Autonomy and Self- 
actualization needs to their fullest. 
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SOURCES OF VARIATION IN JOB AND LIFE SATISFACTION: 
THE ROLE OF COMMUNITY AND JOB-RELATED VARIABLES 


CHARLES L. HULIN? 


University of Illinois, Urbana 


Several hypotheses relevant to the analysis of the effects of community charac- 
teristics on job satisfaction were tested. The Ss were 390 male and 80 female 
white-collar workers employed by the same company and living in two com- 
pany towns in Canada. The two towns differed along certain dimensions. 
Predictions were made regarding the differences in reactions by the workers 
to these two communities. Predictions were also made regarding the relation- 
ship between responses to the communities and responses to general job and 
life satisfaction. Sex differences were present but the data supported the 
hypotheses. A discussion of the relevance of these data for job satisfaction and 


motivation theory is presented. 


Several articles have appeared recently 
which analyze the role played by environ- 
mental variables in determining job satisfac- 
tion and motivation. These studies (Blood & 
Hulin, 1967; Hulin, 1966; Hulin & Blood, 
1968; Katzell, Barrett, & Parker, 1961; Ken- 
dall, 1963; Turner & Lawrence, 1965) used 
traditional S-R paradigms. In some of these 
studies characteristics of the community in 
which the plant being studied was located 
were assessed by means of data from census 
tracts (to index such variates as cost of liv- 
ing, standard of living, slums, extent of urban- 
ization, etc.) or state-published population 
figures (to index town size). 

These community characteristics have been 
used to predict either workers’ satisfaction 
with various aspects of the job, behaviors in 
the job situation, or the relationship between 
satisfaction and job characteristics. The re- 
sults indicate that community characteristics 
can be used very effectively as predictors of 
mean responses to the job and as moderators 
of relationships between job characteristics 
and worker responses. Individual differences 
in preferences for work role outcomes can 
be predicted using such community variables 
(Blood & Hulin, 1967; Hulin & Blood, 1968; 

1 The author would like to thank H. Peter Dachler, 
Linda Yarham, and the officials of the company in- 
volved for their help and cooperation in the re- 
search described in this paper and George Graen and 
Harry Triandis who read and commented on an 
earlier draft. Requests for reprints should be sent to 


the author, Department of Psychology, University 
of Illinois, Urbana, Illinois 61801. 


Turner & Lawrence, 1965). These significant 
predictions of preferences for work role out- 
comes suggest reasons for the futility of the 
search for general laws of satisfaction and 
motivation in industrial psychology. No 
longer can general hypotheses be formulated 
which state that “Workers want larger jobs, 
more responsibility, more autonomy, oppor- 
tunities for self-actualization, etc.” 

Of equal interest are the predictions of 
satisfaction levels from selected community 
characteristics (Hulin, 1966; Kendall, 1963). 
These latter two studies yielded results con- 
sistent with the frame of reference or adapta- 
tion level hypothesis of job satisfaction which 
suggests that the workers’ responses to vari- 
ous levels of environmental return are in- 
fluenced as much by what they see other 
workers getting as by what they get (Smith, 
Kendall, & Hulin, 1969). Community char- 
acteristics were chosen which were theoreti- 
cally related to the frame of reference of the 
individual workers for judging the quality of 
any given level of environmental return. That 
is, cost of living and standard of living in the 
communities were chosen to index the work- 
ers’ economic frame of reference. The results 
confirmed the economic frame of reference 
hypothesis in that workers in communities 
with high costs and standards of living were 
less satisfied with their pay even though in 
Hulin’s study pay was constant across com- 
munities. The general effects of community 
characteristics on job attitudes are also dem- 
onstrated, 
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The assumption made in all of these studies 
is that community characteristics are indexes 
of a certain class of psychological variables. 
The economic characteristics of the commu- 
nities (cost of living, standard of living, etc.) 
do not have a direct effect on the pay satis- 
faction of the workers in the community. 
Rather, it is assumed that the effects of the 
economic standards of the community are 
mediated through intervening psychological 
variables (frames of reference, adaptation 
levels) to produce their impact on pay satis- 
faction. However, this assumption has not 
been tested in any of the research published 
to date. The actuarial relationship between 
environmental characteristics on the one hand 
and job attitudes on the other is all that has 
been shown. 

In order to test the assumptions made by 
Kendall and Hulin, one should sample work- 
ers from at least two communities. Measures 
of satisfaction with the job should be obtained 
from these samples of workers as well as their 
verbalizations regarding cost of living, what 
is regarded as an adequate wage, other jobs 
in the community, etc. From such data one 
would expect (@) across communities there 
would be an effect on job satisfaction similar 
to that observed by Hulin (1966) and Ken- 
dall (1963), (0) across communities there 
should be an effect on the verbalizations which 
are supposedly tapping the intervening vari- 
ables, and (c) within communities there 
should be an effect on job satisfaction due to 
the differing responses made by the individual 
workers to the same situations. That is, 
workers who verbalize that their community 
has a high cost of living, for example, should 
be less satisfied with wages than those who 
say that the cost of living is low. 

The research described in this paper was 
designed to test some of the assumptions made 
by Kendall and Hulin and the generality of 
the relationships between community charac- 
teristics and job satisfaction. 


METHOD 
Research sites 


The research was conducted in two “company” 
towns in British Columbia, Canada. The larger of 
the two communities (Town A) has a population of 
10,000 people. Although the company no longer owns 


CuarwLes L. HuLin 


the stores, service areas, and individual houses, it 
does own all of the surrounding land, and building 
lots must be purchased from the company. Building 
plans for individual houses must be approved by a 
committee of company executives. The commercial 
establishments of the town, while technically inde- 
pendent of the company, are in fact regulated by 
company policy. While most of the more distasteful 
accoutrements of the traditional company town are 
absent (Allen, 1966), this town is so strongly domi- 
nated by one company and has enough of the char- 
acteristics to be considered a company town. 

Town A is relatively isolated from the remainder 
of British Columbia. For example, it is located at a 
distance requiring a 2-hr. plane ride (if the plane 
flies), a long two-day drive, or a 30-hr. boat trip 
from the nearest major city. The physical setting of 
Town A is spectacularly beautiful. The town’s public 
facilities are generally excellent. For example the 
town has a 350 bed hospital which was built by the 
company and sold to the community for a nominal 
sum. All of the town’s streets are paved and all of 
the houses are on a sewer system and town water 
supply. Ten doctors (eight general practitioners and 
two obstetric and gynecology specialists) practice in 
the community. In contrast, there are only two 
dentists. The physical facilities in the community’s 
educational system are also outstandingly good. The 
schools are located throughout the community and 
none of them is crowded. The adequacy of the 
recreational facilities in the community varies de- 
pending on the age of the participant. There is the 
usual collection of curling sheets, hockey rinks, soft- 
ball and baseball diamonds, and football fields in 
the community. Many of these were built by the 
company. However, for the very young children and 
the post-high-school inhabitants, organized recrea- 
tional facilities are inadequate. Recreational oppor- 
tunities are plentiful for the adults if they appreciate 
hunting, fishing, camping, and boating. However, 
beyond this, there is very little. Fancy restaurants 
and night clubs are totally lacking. There is one 
movie theatre which operates three nights a week. 
As a side comment, it should also be pointed out 
that crime, juvenile delinquency, unemployment, and 
air pollution are also absent and the community en- 
joys one of the highest standards of living in Canada. 

The other of the two communities (Town B) has 
a population of approximately 200 people. It is 
located 50 mi. by water from the larger community. 
There are no roads in or out of this town. Trans- 
portation in and out is either by a twice-weekly 
boat or, in case of emergency, by one of the heli- 
copters stationed there. The physical setting for the 
smaller community is equally spectacular. It is 
located in a narrow canyon at the confluence of two 
mountain streams. There are notable differences, 
however, between this community and the larger 
community. In Town B all the houses are owned by 
the company and rented to the inhabitants. This is 
not a great hardship since a three-bedroom house 
rents for approximately $35 per month. There is 
only one store in town which is owned by the 
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company, but leased to a third party. The company 
controls the profits of this third-party leasor. There 
is only one eight-grade school in this smaller town 
which requires that older children be boarded in the 
larger community during the school year where they 
attend high school. The family of high school age 
children receives a boarding allowance. The town 
has a small dispensary instead of a hospital with one 
public nurse fulfilling the role of town medical 
agent. There are no dentists or dental facilities in 
this community. These two communities, then, while 
both being towns controlled by the same company 
do differ in significant ways. 

Such research sites insure that the workers sam- 
pled will be responding to the same community 
characteristics and not to characteristics of a suburb 
or satellite community. Also, all the workers are 
employed by the same company. The physical en- 
vironment has been controlled and most of the un- 
wanted variation in the social and economic environ- 
ment has been effectively removed. Workers’ re- 
sponses in such communities would provide a partial 
test of Kendall’s and Hulin’s assumptions. 


Subjects 


All of the salaried workers in both communities 
were requested to take part in this study. This in- 
cluded everybody from the first line supervision to 
the plant manager. The questionnaires were actually 
completed by 76% of the salaried workers in the 
communities. This included 442 workers from Town 
A and 28 workers from Town B. The questionnaires 
were given to the individual workers by work-group 
representatives who had met previously with the 
investigator. The individual workers were requested 
to complete the questionnaire on company time and 
send it through company mails to the investigator 
or mail it directly to the University of Illinois. 
Anonymity was guaranteed to all the workers in the 
sample. The investigator remained in the community 
and worked out of an office remote from the per- 
sonnel department for a period of 10 days. Any 
workers who wished to talk to the investigator were 
urged to do so. A great number of the workers did. 
The characteristics of the two samples of workers 
are given in Table 1. 


Variables 


On the face sheet of the questionnaires, questions 
were asked regarding the respondent’s age, the de- 
partment in which he worked, sex, years of formal 
education, length of service with the company, 
length of service at present location, job title, 
whether the worker was on shift work or day work, 
and work location (community). The job title was 
used to obtain measures of the worker’s job level 
and salary. 

Measures of the workers’ satisfaction with actual 
work done, pay, promotional opportunities, super- 
vision, and co-workers were obtained by means of 
the Job Descriptive Index (JDI). The JDI is a 
cumulative point adjective check-list measure of job 
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TABLE 1 
SAMPLE CHARACTERISTICS OF WORKERS 
FROM Two CoMMUNITIES 
Town A Town B 
N = 442 NE——28 
Characteristic ( ) ( 
EX Ss x Ss 
Age 39.71 | 11.06 | 39.79 | 12.85 
Sex (1 = male, 0 = 
female) 83 14 ie 19 
Tenure with company 8.94} 5.71] 6.19] 5.28 
Tenure at location 6:55ile 2-08) | 5:12) 4251 


Shift (O= shift, 1 = 
days) 84 14 .93 .06 
Yrs. education 125 





satisfaction which possesses adequate convergent and 
discriminant validity for individual analysis (Quinn 
& Kahn, 1967, p. 456; Smith, 1967; Vroom, 1964, 
p. 100). Three additional job-related satisfaction 
variables were assessed by means of the 5-point 
graphic rating scale described below. The three ad- 
ditional variables were satisfaction with manage- 
ment’s response to complaints, satisfaction with 
training opportunities, and satisfaction with work- 
ing conditions. 

Two measures of overall satisfaction were also 
obtained. Each worker’s satisfaction with his job in 
general (JIG) and life in general (LIG) was esti- 
mated by means of the General Motors Faces Scale 
(Kunin, 1955). In this particular study 7-point ver- 
sions of the original 11-point scale were used. Each 
of these two general satisfaction scales consisted of 
three smiling faces, one neutral face, and three 
scowling faces. The workers were asked to indicate 
how they felt about their job in general or life in 
general, considering everything about their present 
situation, by checking the appropriate face. 

Workers’ responses to their communities were as- 
sessed by 5-point graphic rating scales. The scale 
points were labeled Very Dissatisfied, Somewhat Dis- 
satisfied, Neither Satisfied nor Dissatisfied, Somewhat 
Satisfied, and Very Satisfied. The workers were asked 
to indicate their degree of satisfaction with each of 
the characteristics listed. They were asked to place a 
check mark on the line at the point which indicated 
their degree of satisfaction. They were told that the 
mark could be made at any point along the line. 
These scales were used to assess workers’ satisfaction 
with medical facilities, school facilities, the weather, 
availability of living accommodations, availability 
of doctors, recreational facilities, teachers in the 
school system, availability of dentists, shopping fa- 
cilities, cost of living, recreational facilities for chil- 
dren, cost of housing, location of the community in 
terms of its isolation and remote location, and the 
attractiveness of the community as a place to live. 
The same 5-point graphic rating scale was used to 
assess the workers’ satisfaction with management’s 
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responsiveness to complaints, satisfaction with train- 
ing opportunities, and satisfaction with physical con- 
ditions at their place of work. 

Finally, the importance each worker ascribed to 
these same community and job characteristics in de- 
termining the way he felt generally was assessed by 
means of similar 5-point graphic rating scales. The 
intervals of these 5-point scales were labeled Very 
Important, Important, Only Moderately Important, 
Unimportant, Very Unimportant. In the present study 
only the analysis of the workers’ satisfaction scores 
will be reported. 


Hypotheses 


Examination of the assumptions made by Hulin 
and Kendall in their analyses of the effects of com- 
munity characteristics on job satisfaction involves 
testing two interrelated sets of hypotheses. The first 
set of hypotheses is concerned with the differences 
in the workers’ reactions to the two communities. 
The second set of hypotheses is related to the rela- 
tionships between the workers’ reaction to the com- 
munities and their job satisfaction within the two 
samples. Since the two communities involved in this 
research differ on known characteristics it is pos- 
sible to formulate a priori hypotheses regarding dif- 
ferences in the workers’ reactions to the character- 
istics of the communities. Many of these hypothesized 
differences are based on an enumeration of the 
physical characteristics of the communities. Such 
differences may be revealed, for example, by the 
usual census-type data. Other hypotheses are based 
on the author’s reaction to the two communities and 
on conversations with the workers. 

It would be expected that the workers in Town A 
would be more satisfied with the medical facilities, 
school facilities, availability of doctors, availability 
of dentists, and shopping facilities than would the 
workers in Town B. On the other hand, the workers 
in Town B should be more satisfied with the avail- 
ability of housing, the cost of living, the cost of 
housing. No differences were hypothesized for the 
remaining variables. 

Confirmation of these expected differences in the 
two mean vectors of responses would be evidence 
that the workers were responding in the expected 
manner to community characteristics. Further, it 
would indicate that the hypothesized intervening 
variables through which community characteristics 
have their effect on job satisfaction were being af- 
fected in the appropriate direction by differences in 
community characteristics. 

The second set of hypotheses is concerned with 
the relationships between the workers’ responses to 
their communities and their satisfactions with specific 
aspects of the job. The only hypotheses which can 
be made with any degree of certainty involve the 
relationship between the workers’ satisfaction with 
pay and their response to the economic factors of 
the community. Therefore, we would hypothesize that 
there would be positive relationships between the 
workers’ pay satisfaction and their satisfaction with 
the availability of housing, satisfaction with shopping 
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facilities, satisfaction with cost of living, and satis- 
faction with cost of housing. There will probably be 
other relationships between responses to aspects of 
communities and specific aspects of job satisfaction 
but no predictions can be made at this time. Con- 
firmation of this set of hypotheses would be evidence 
supporting the mediating effects of satisfaction with 
community characteristics on job satisfaction. 

Finally one can attempt to predict the two mea- 
sures of overall satisfaction using the complete set 
of community satisfaction measures and job satis- 
faction measures as predictors. While such an analysis 
is not necessary for confirmation of the frame of 
reference hypothesis, it does provide evidence re- 
garding the effects of community and job character- 
istics on general job and life satisfaction. Here again, 
a number of hypotheses can be made. It would be 
predicted that for the total sample of workers, satis- 
faction with specific aspects of the job would be 
more strongly related to general job satisfaction than 
to general life satisfaction. On the other hand, it is 
expected that satisfaction with community charac- 
teristics would be more closely related to general 
life satisfaction than to general job satisfaction and 
that the multiple correlations predicting general job 
satisfaction would be larger than the multiple cor- 
relations obtained when predicting general life satis- 
faction. While this represents a relatively good job 
of sampling characteristics of the job known to be 
related to overall job satisfaction, the sampling of 
community characteristics may not include many of 
the most important characteristics which determine 
general life satisfaction. Finally, differences should 
be expected between the male and female samples of 
workers in terms of the variables which contribute 
to variation in overall life and job satisfaction. Such 
differences would confirm the differing motivational 
characteristics of the two groups and would provide 
evidence on the nature of the motivational differ- 
ences between males and females. 


RESULTS 


The data necessary to test the first hypoth- 
esis regarding differences in the workers’ re- 
sponses to the two communities are presented 
in Table 2 which gives the means for each of 
the 14 community satisfaction variables for 
the two groups of workers. The ¢ ratios testing 
the significance of the difference between 
the two groups are also presented for those 
variables for which predictions were made. It 
is recognized that this analysis technique is 
not completely appropriate for these data. 
While the alpha level for any particular com- 
parison will be accurate, ¢ tests on correlated 
dependent variables will give an underestimate 
of the alpha level for the study as a whole. 
However, the other candidates for the test of 
the differences between the mean vectors 
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TABLE 2 
SATISFACTION WITH COMMUNITY CHARACTERISTICS 
Town A Town B 
Satisfaction with Wee) ies 28) t p 
Xd S. XG iS 
Medical facilities 4.21 93 3.00 LAD 5.58 <a() i 
School facilities 4.00 97 3.39 1.03 3.03 <.01 
Weather 2.60 1.02 3.07 eZ 
Availability of housing 3.04 1.44 4.14 NED 7 4.94 <.01 
Availability of doctors 4.19 ey 2.54 1.20 7.64 <.01 
Adult recreational facilities 3.55 1.21 4.18 Hee 
Teachers 3.58 99 3.68 fet 
Availability of dentists 2.71 1.30 1.89 1.03 4.04 <.01 
Shopping facilities 2.45 1.14 1.86 .93 3.07 <.01 
Cost of living 2.14 1.00 3.39 1.34 5.49 << () 
Children’s recreation 3.65 1.07 3.25 1.17 
Cost of housing 252, 1.32 4.64 62 13a <.01 
Location of community 2.90 1.22 3:02) 125 
Attractiveness of community 3.97 AS) 4.29 16 








(T° analysis, discriminant function analysis), 
while avoiding the issue of correlated depen- 
dent variables, are unjustified on other 
grounds (unequal variance-covariance ma- 
trices). 

It can be seen that for the eight variables 
where differences were predicted, the differ- 
ences did occur, were always in the expected 
direction, and were sizable enough to reach 
significance. Significance, however, may be of 
little practical value since the total NV is 470. 
Such an WV almost ensures statistical signifi- 
cance for even small differences. The size of 
the difference in comparison to the variance 
of the variables is usually large. For the six 
variables where no differences were predicted, 
the differences are generally smaller. Such re- 
sults confirm the effects of community char- 
acteristics on responses to the communities. 
Further, regarding responses to the commu- 
nities as intervening variables, ‘such results 
are evidence that the differences between 
communities have the desired effect on the 
differences in the intervening variables. 

It is also interesting to note that the differ- 
ences in the responses to the two communities 
not only confirm the hypothesis but also that 
the relative magnitudes of the satisfaction 
measures within each of the communities are 
as expected. Those characteristics considered 
to be outstanding in Town A, such as medical 














facilities, school facilities, and the availability 
of doctors, are in fact the three variables 
which obtained the highest mean satisfaction 
scores. Those variables considered to be 
worse, such as availability of dentists, cost of 
living, and cost of housing, did in fact receive 
very low satisfaction scores from these work- 
ers. The characteristics considered most out- 
standing in Town B were the availability of 
housing and the cost of housing. These two 
variables received two of the three highest 
satisfaction scores. The three characteristics in 
Town B considered outstandingly bad were 
the availability of doctors, availability of 
dentists, and shopping facilities. These vari- 
ables also received the lowest mean satisfac- 
tion scores. Therefore the data in Table 2 
not only confirm the differences between com- 
munities, but also the relative magnitudes of 
the satisfaction scores within communities are 
in line with expectation, further evidence for 
the lawful operation of these intervening 
variables. 

Tables 3 and 4 present the correlations be- 
tween the workers’ responses to their com- 
munities and their job satisfactions. It was 
hypothesized that the workers’ satisfaction 
with the availability of housing, shopping 
facilities, cost of living, and cost of housing 
would have positive relationships with their 
satisfaction with pay. These four community 
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TABLE 3 


CORRELATIONS BETWEEN Responses To COMMUNITY AND SPECIFIC Jon 
SATISFACTION M&rasures IN Town A 











Satisfaction with 
Satisfaction with es 

Work Pay Promotions Supervisor Co-workers | 
Medical facilities 19 .06 02 ad ie | 
School facilities edd 10 .04 BZ Li 
Weather a 14 .06 08 21 
Availability of housing 18 248 —.04 13 Do | 
Availability of doctors 10 aalil —.01 .09 .20 . 
Adult recreational facilities sak 29 04 18 .26 | 
Teachers 06 06 04 alt 16 
Availability of dentists .09 18 05 me 19 
Shopping facilities 12 .268 .02 18 16 
Cost of living 19 388 14 18 22 
Children’s recreation 24 .20 10 .20 .20 
Cost of housing .16 30 02 .16 .20 
Location of community .18 12 .08 14 19 
Attractiveness of community mua 16 05 14 18 

















Note.—For an N of 442,r >.10, p < .05;7 >.13, p < .01. 
8 Predicted to be positive and significant. 


variables are reflecting (or are related to) the 
cost of living in the community. Previous re- 
sults indicate that variables indexing cost of 
living have their strongest relationships with 


tion. The data in Table 3, based on the 
sample of workers from Town A, support this 
hypothesis. Satisfaction with adult recrea- 
tional facilities and children’s recreational 


pay satisfaction. Variables measuring satis- 
faction with cost of living are expected to 
have strong relationships with pay satisfac- 


facilities are the other two variables having — 


strong relationships with pay satisfaction. 
Why these two variables should be related to 


TABLE 4 


CORRELATIONS BETWEEN RESPONSES TO COMMUNITY AND SPECIFIC JoB 
SATISFACTION MEASURES IN ToWN B 








Satisfaction with 
Satisfaction with 


Work Pay Promotions Supervisor Co-workers 
Medical facilities —.08 16 .09 .28 —.10 
School facilities ra 40 nao 36 21 
Weather .22 —.06 —.03 —.12 .28 
Availability of housing .06 258 mL —.05 —.09 
Availability of doctors —,12 —.21 — .04 04 —.18 
Adult recreational facilities =—T10 Sn — 17 —.22 —.02 
Teachers ry ROW —.15 on) .28 
Availability of dentists .23 .00 —.08 11 06 
Shopping facilities ao £03" 721 =15 5 
Cost of living .23 448 .05 14 AL 
Children’s recreation —.12 Bei —.01 .05 —.07 
Cost of housing —.09 .5o* 08 ae —,29 
Location of community 28 12 O1 21 36 
Attractiveness of community —.11 0) He 32 .20 


Note.—For an N of 28,r > .37, p <.05;r > .47,p < .01. 
® Predicted to be positive and significant. 
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TABLE 5 


CORRELATIONS OF ALL SATISFACTION VARIABLES WITH 
JoB IN GENERAL AND LIFE IN GENERAL OF MALES 




















Satisfaction variable JIG | LIG 
JDI work AGEN 2 
JDI pay P20 ee 20s 
JDI promotion Ot ee | Lone 
JDI supervisor 5S ie et ae 
JDI co-workers zo leaee ie 0) a 
Management’s response to complaints | .32** | .22** 
Training opportunities PLOmaa 09 
Working conditions oD] tame er horse 
Medical facilities 04 .09 
School facilities .03 a2 
Weather ED eae Sn 
Housing ED Die RD ee 
Doctors .07 a3 
Adult recreational facilities mS eee? Dace 
Teachers 04 ala 
Dentists .14* moe 
Shopping facilities ALORS EN 220% 
Cost of living AAD | Phebe 
Child recreation eas eet 2 


Cost of housing ml .09 
Location of town Oe 
Attractiveness of town DEE We 24F* 





Note.—N = 387. 

* p < .05; two-tailed test. 

** > < .01; two-tailed test. 
pay satisfaction is unclear. It should be noted, 
however, that these two responses to the 
community show the most general relation- 
ships with all of the measures of specific job 
satisfaction. The data in Table 3 also indicate 
moderate relationships between satisfaction 
with community characteristics and satisfac- 
tion with actual work done. This is in line 
with previous results. Not in line with previ- 
ous results are the relationships between com- 
munity characteristics and co-worker satis- 
faction. Satisfaction with co-workers generally 
has not behaved in the lawful manner that the 
other job satisfaction variables have. It is 
usually unrelated to the predictor variables 
which are useful in understanding variation in 
other aspects of job satisfaction. 

The results in Table 4 based on the workers 
from Town B give only limited support to 
the second hypothesis. Three of four com- 
munity variables hypothesized to have posi- 
tive relationships with pay satisfaction tend to 
have high relationships but only two are sig- 
nificant. Satisfaction with shopping facilities 
does not fall in line with the remainder of 
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the results. In addition, satisfaction with 
school facilities and satisfaction with the at- 
tractiveness of the community also show sub- 
stantial relationships with pay satisfaction. 
It should be pointed out that these results 
are based on a sample of only 28 workers. 
The reliability of the correlations is not high. 

Taken together, the data presented in Tables 
3 and 4 generally support the second hypothe- 
sis. The two samples were combined and all 
future results will be based on the combined 
sample of workers from the two towns. 

Tables 5 and 6 present the relationships be- 
tween all of the job satisfaction variables and 
the community satisfaction variables on the 
one hand and general job and life satisfac- 
tion on the other. Table 5 contains the data 
from the male workers and is based on an NV 
of 387. The five JDI satisfaction variables and 
the three graphic rating scales assessing satis- 
faction with specific aspects of the job gen- 
erally have positive relationships with general 
job satisfaction. Seven of these job satisfaction 


TABLE 6 


CORRELATIONS OF ALL SATISFACTION VARIABLES 
WITH JOB IN GENERAL AND LIFE 
IN GENERAL OF FEMALES 














Satisfaction variable JIG LIG 

JDI work oon 14 
JDI pay .29** |—.01 
JDI promotion 30** |—.01 
JDI supervisor 3OF* 103 
JDI co-workers Pies qs 
Management’s response to 

complaints AS** .06 
Training opportunities .08 ed 
Working conditions .40** =12; 
Medical facilities .O1 05 
School facilities nl) .08 
Weather n12 woe 
Housing e276 .08 
Doctors .09 .03 
Adult recreational facilities .10 .20 
Teachers .07 .07 
Dentists 03 —.01 
Shopping facilities 22” .10 
Cost of living woe 08 
Child recreation .07 adie 
Cost of housing .16 as) 
Location of town —.02 none 
Attractiveness of town 14 .09 





Note.—N = 80. 
* pb < .05; two-tailed test. 
** > < .01; two-tailed test. 
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TABLE 7 


VarimaAx Rotated Factor Matrix oF Work- 
RELATED VARIABLES OF MALES 














Factor 
Satisfaction variable I Il MT 
Inter- 
personal | Intrinsic | Extrinsic 
relations 
JDI work 36 10 ast 
JDI pay 04 08 91 
JDI promotion .03 84 10 
JDI supervision 56 38 29 
JDI co-workers 68 38 07 
Management’s response 
to complaints .65 06 50 
Training opportunities 19 06 02 
Working conditions 29 ooo 44 
% Total variance 
explained ZS 20 ly 
% Common variance 
explained 40 32 27 








Note.—N = 387. 


variables also have significant relationships 
with LIG for the male workers but the rela- 
tionships are all lower. In addition, it can be 
seen that the community satisfaction variables 
tend to have stronger relationships with LIG 
satisfaction than they do with JIG satisfac- 
tion. There are 3 out of 14 reversals among 
these relationships, however, and the differ- 
ences between the correlations are usually 
small. 

The data in Table 6 are somewhat different. 
The relationships shown in Table 6 are based 
on the sample of 82 female workers. Seven 
of the eight job satisfaction variables again 
show strong relationships to JIG satisfaction. 
However, these seven variables are the only 
variables which bear substantial relationships 
to satisfaction with JIG for this sample. The 
only other variables which are significantly 
correlated with JIG for females are three of 
the four variables assumed to be reflecting 
satisfaction with the cost of living in the com- 
munities. It can also be seen that for female 
workers the predictability of LIG satisfaction 
is limited with only four variables having sig- 
nificant correlations with this variable. ‘These 
significant correlations also tend to be rather 
low. 
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The interpretation of the relationships in 
Tables 3, 4, 5, and 6 is complicated by the 
presence of so many variables. The commu- 
nity- and job-related satisfaction variables 
were subjected to principal component analy- 
ses for the male and female workers sepa- 
rately, in order to simplify the interpretations, 
Table 7 presents the three-factor Varimax 
rotated solution for the eight work-related 
variables for the male sample. (In this and all 
subsequent factor analyses in this study, prin- 
cipal axis solutions with unities in the main 
diagonal were used. In all cases several differ- 
ent numbers of factors were rotated to the 
Varimax criterion. The solutions presented are 
the ones which appeared best in summarizing 
the data and making psychological sense.) 
These three factors explained 62% of the 
total variance for males. The first factor was 
interpreted either as a general factor or as 
an interpersonal relations satisfaction factor. 
JDI supervision satisfaction, JDI co-worker 
satisfaction, satisfaction with management’s 
responsiveness to complaints, and satisfaction 
with training programs all involve some ele- 
ment of the workers’ response either to their 
co-workers or to the company management in 
the plant. The second factor was interpreted 
as an intrinsic job satisfaction factor since it 
was defined by high loading of JDI work and 
JDI promotion (advancement) satisfaction. 
The third factor was interpreted as an ex- 
trinsic job satisfaction factor and was defined 
by high loadings for JDI pay satisfaction 
with working conditions in the plant. 

The 14 community satisfaction variables 
for the male workers were also factor ana- 
lyzed. These results are presented in Table 
8. The five components were rotated to 
the Varimax criterion. These five factors ex- 
plained 65% of the total variance. The first 
factor extracted was interpreted as an eco- 
nomic factor and was defined by high loadings 
of satisfaction with availability of housing, 
satisfaction with shopping facilities, satisfac- 
tion with the cost of living, and satisfaction 
with the cost of housing. The second factor 
was interpreted as satisfaction with the physi- 
cal setting of the community. It was defined 
by high loadings of satisfaction with the 
weather in the community, satisfaction with 
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TABLE 8 
Varimax RotatepD Factor Matrix oF COMMUNITY VARIABLES OF MALES 
Factors 
Satisfaction variable TI ai Tl IV Vv 
Economic Physical Medical Recreation Education 
factors setting facilities facilities facilities 
Medical facilities —.05 ml 17 25 a 
School facilities .10 .04 36 14 74 
Weather ais 81 alt —.01 .02 
Housing availability 76 a3 08 .08 .20 
Doctor availability .06 me, 84 07 04 
Adult recreational facilities .28 .16 08 .68 14 
Teachers 13 .09 —.03 ml .86 
Dentist availability OL —.09 52 —.12 14 
Shopping facilities ay JM 24 25 .02 
Cost of living EO 43 —.12 19 —.10 
Children’s recreation 02 .09 i 85 Al 
Cost of housing 19 .20 —.08 .05 09 
Location of community 24 78 .03 lla} .06 
Community attractiveness .08 54 eo mi rll 
% Total variance accounted for 7) 14 13 11 10 
% Common variance accounted for 26 iM 20 7 16 




















Note.—N = 387. 


the attractiveness of the community, and 
satisfaction with the location of the com- 
munity. The third factor was interpreted as 
satisfaction with medical facilities in the com- 
munity and was defined by high loadings of 
satisfaction with availability of dentists. The 
fourth factor was labeled as satisfaction with 
recreational facilities in the community and 
was defined by high loadings of satisfaction 
with adult recreation, and satisfaction with 
children’s recreational facilities. The fifth 
factor was interpreted as satisfaction with 
the educational facilities in the community 
factor. It was defined by high loadings of 
satisfaction with school facilities and the 
satisfaction with the teachers in the school 
system. Factor scores were computed for the 
male workers on these eight factors.? Mul- 
tiple regression analyses were then done be- 
tween these eight factor scores as predictor 
variables and JIG and LIG as two criterion 


2 The formula used for computing the two sets of 
factor scores was Z R™* F where Z is on N X M data 
matrix of M standard scores for each of N Ss, R-} is 
the inverse of the M X M intercorrelation matrix, and 
F is the M X K factor matrix of the loadings of the 
M variables on the K orthogonal factors. 


variables. The results of these analyses are 
shown in Table 9. These results indicate that 
the job satisfaction variates have substantial 
relationships (in terms of standard partial 
beta weights) with JIG satisfaction and lower 
relationships with LIG satisfaction. On the 
other hand, only one of the five community 
factors (economic factors in the community) 
has a sizable relationship with LIG satisfac- 
tion and none is strongly related to JIG satis- 


TABLE 9 


STANDARDIZED PARTIAL REGRESSION WEIGHTS 
PREDICTING JOB IN GENERAL AND LIFE IN 
GENERAL SATISFACTION OF MALES 








Predictor variable JIG LIG 
Interpersonal relations 42 ae, 
Intrinsic job satisfaction 33 led 
Extrinsic job satisfaction .24 ilo 
Economic factors in communities LS ahs 
Physical setting of communities .06 rZ 
Medical facilities in communities —.15 —.01 
Recreation facilities in communities O1 08 
School facilities in communities 01 LZ 

R oe .44* 


Note.—N = 387. 
¥*> <.01. 
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TABLE 10 


VARIMAX RotTaTeD FActor Matrrx or Work 
RELATED VARIABLES OF FEMALES 











Factors 

Satisfaction variables T II 

General|  E* 
trinsic 

JDI work .67 38 
JDI pay 05 76 
JDI promotion .67 Pail 
JDI supervisor 18 ne 
JDI co-workers .66 fz 
Management response to complaints 54 FO 
Training opportunities a —.09 
Working conditions 24 .61 
% Total variance accounted for 29 26 
% Common variance accounted for 53 47 








Note.—N = 81. 


faction. For this sample of 387 male workers 
the multiple correlation between the eight pre- 
dictor variables and JIG satisfaction is .55. 
The multiple correlation predicting LIG satis- 
faction is .44. Both of these multiple correla- 
tions are significant, p < .01. 
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The intercorrelations of the eight job-re- 
lated variables for the female workers were 
also factor analyzed. Two components from 
this analysis were rotated to the Varimax 
criterion. The Varimax rotated matrix is 
presented in Table 10. The first factor ex- 
tracted appears to be basically a general 
factor with high loadings for JDI work, JDI 
promotions, JDI co-workers, management’s 
responsiveness to complaints, and satisfaction 
with training opportunities. The second factor 
is clearly an extrinsic job satisfaction factor. 
It is defined by high loadings of JDI pay, 
JDI supervisor, and satisfaction with work- 
ing conditions. These two factors explained 
55% of the total variance. 

The intercorrelations of the 14 community 
satisfaction variables were also factor ana- 
lyzed. Again a five-factor solution appeared to 
be the best way of summarizing these data. 
The factor matrix is presented in Table 11. 
These five factors appear to be identical to 
the five factors extracted from the data pro- 
vided by the male sample although they were 
extracted in a different order. The factors 
were interpreted as satisfaction with medical 


TABLE 11 


VARIMAX ROTATED FAcTOR MATRIX OF COMMUNITY VARIABLES OF FEMALES 











Satisfaction variable I 

Medical 

facilities 
Medical facilities 90 
School facilities oo 
Weather —.10 
Housing availability .06 
Doctor availability 88 
Adult recreational facilities 04 
Teachers 03 
Dentist availability 59 
Shopping facilities .50 
Cost of living .08 
Children recreation 14 
Cost of housing —.05 
Location of community 13 
Community attractiveness .16 
% Total variance accounted for 18 
% Common variance accounted for 26 


Note.—N = 81. 





Factors 
II III IV Vv 
Economic Physical Recreation Education 
factors setting facilities facilities 

.02 .09 .06 .07 
.06 —.14 13 .60 
.06 79 —.00 Al 
87 02 .08 18 
.06 —.04 .08 08 
29 .05 81 —.15 
.10 PZ —.04 .90 
.09 co —.50 —.06 
34 34 —.04 ie 
78 31 .09 —.06 
—.06 38 .66 sete) 
89 mL 05 .02 
18 .63 .00 .03 
14 ne 22, —.10 

17 15 10 10 

25 21 15 14 
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facilities, satisfaction with the economic fac- 
tors in the community, satisfaction with the 
physical setting of the community, satisfaction 
with the recreational facilities in the com- 
munity, and satisfaction with educational fa- 
cilities in the community. These five factors 
explain 70% of the total variance of the 
original 14 X 14 intercorrelation matrix. 

Scores were computed on each of these 
seven (two work-related and five community- 
related) factors for each of the 81 female 
workers. Two multiple regression analyses 
were carried out to examine the relationship 
between the seven predictor variables and JIG 
and LIG satisfaction. The results of these 
analyses are given in Table 12. An examina- 
tion of Table 12 indicates that of the seven 
predictor variables only two bear substantial 
and significant relationships to JIG. These 
are intrinsic job satisfaction and extrinsic job 
satisfaction. The multiple correlation of .67 
for predicting JIG satisfaction is highly sig- 
nificant. The results for predicting LIG satis- 
faction are neither consistent nor impressive. 
None of the beta weights for the predictor 
variables is significant and the multiple cor- 
relation, .36, was not significant. 

Canonical regression analyses were also 
carried out predicting JIG and LIG job satis- 
faction jointly for male and female workers. 
These two canonical analyses were performed 
in order to examine the possible interactive 
effects of JIG and LIG satisfaction. The re- 
sults of these two analyses in no way made 
the interpretation of the data any more clear 
or consistent. They will not be reported here. 


DISCUSSION 


The results of this study lend substantial 
support to the validity of the assumptions 
made by Kendall (1963) and Hulin (1966) 
in their analysis of the effects of community 
characteristics on job satisfaction. The data 
in this study demonstrate that differences be- 
tween communities result in predictable dif- 
ferences in the workers’ satisfaction with 
these communities. These same community 
characteristics which result in differences in 
their satisfaction with the cost of living in 
the community also have a significant effect 
on their satisfaction with pay. Secondly, it 
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TABLE 12 


STANDARDIZED PARTIAL REGRESSION WEIGHTS 
FOR PREDICTING JoB IN GENERAL AND LIFE 
IN GENERAL SATISFACTION OF FEMALES 





Predictor variable JIG LIG 
Intrinsic job satisfaction moO .20 
Extrinsic job satisfaction Oo .07 
Medical facilities —.21 — .06 
Economic factors —.05 —.02 
Setting —.02 eal 
Recreational facilities —.01 ley 
Educational facilities .02 —.01 

R [Ova .36 
Note.—N = 81. 
*p < 01. 


was demonstrated that the workers’ satisfac- 
tion with the economic characteristics of the 
community had the expected effect on their 
satisfaction with pay. Thirdly, it was demon- 
strated that workers’ satisfaction with com- 
munity characteristics and satisfaction with 
job characteristics considered jointly had sig- 
nificant and predicted effects on their satis- 
faction with their JIG and their satisfaction 
with their LIG. Finally differences between 
male and female workers in terms of the 
variables which were related to overall job 
and life satisfaction were reasonable. The 
magnitude of multiple correlations predicting 
general job satisfaction from community and 
specific job satisfaction variables were not 
only significant but were substantially large. 
For males, the eight predictor variables con- 
trolled approximately 30% of the variance 
in job satisfaction and for the females, the 
seven predictor variables accounted for ap- 
proximately 45% of the variance. Therefore, ° 
not only are the relationships statistically sig- 
nificant, but they are large enough to be 
considered as practically significant. 

Kendall (1963), Hulin (1966), and Smith, 
Kendall, and Hulin (1969) have main- 
tained that the standards against which work- 
ers compare the level of their environmental 
return cannot be considered to be constant 
from one community to another, from one 
plant to another within the same community, 
or even from one worker to another within 
the same plant. These writers have stressed 
that using an adaptation level or frame of 
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reference set of hypotheses to analyze workers’ 
responses to their jobs gives a much more 
consistent and meaningful set of results than 
simply assuming that what we as middle-class 
white investigators regard as good is what all 
workers will regard as good. The results of a 
small number of studies now done on this 
problem indicate that the assumptions made 
by Kendall, Hulin, and Smith are valid and 
are useful in understanding workers’ re- 
sponses to their jobs. Present data along with 
these previous studies also indicate that situa- 
tional variables should no longer be con- 
sidered as nuisance parameters to be con- 
trolled or partialed out of our predictive 
equations. They should be regarded as a 
valid and meaningful source of variance and 
their effects should be analyzed rather than 
removed. 

Finally, these results in conjunction with 
the results of Blood and Hulin (1967) and 
Hulin and Blood (1968) indicate that not 
only must the effects of differences in prefer- 
ence for work role outcomes be considered but 
also differences in standards for judging the 
goodness or badness of any given level of any 
given work role outcome. While such a con- 
sideration undoubtedly complicates the life 
of the investigator, the benefits to be gained 
from such analyses are enormous. The ubiqui- 
tous sex differences in both job satisfaction 
and job motivation which have plagued in- 
vestigators for years can probably be con- 
sidered as simply a combination of differences 
in preferences for work role outcomes and 
standards for judging these work role out- 
comes. Likewise the earlier results of Katzell, 
Barrett, and Parker (1961), Turner and 
Lawrence (1965), and Whyte (1955) report- 
ing urban-rural differences in job motivation 
are understandable as a combination of both 
differences in preference and difference in 
standards. Further, by using one set of 
moderator variables chosen to index individual 
differences in preferences for work role out- 
comes and another set of moderator variables 
designed to index internal standards for the 
judgment of levels of work role outcomes, 
data from large groups of industrial workers 
employed by different companies and in dif- 
ferent communities can be analyzed without 
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resorting to laborious subgroup analyses. If 
such moderator variables are wisely chosen 
and accurately measured, moderated regres- 
sion analyses predicting either motivation or 
job satisfaction should be useful for under- 
standing the motivation of industrial workers. 

Finally, the use of the implications con- 
tained in these data in conjunction with in- 
strumentality theory (Vroom, 1964) may 
provide a means of linking these two ap- 
proaches to the study of job satisfaction. The 
instrumentality theory as discussed by Vroom 
is completely ahistorical. No concern is paid 
to the sources of variance in job satisfaction, 
why different workers have different valences, 
or why two workers will attribute different 
instrumentalities to the same job for provid- 
ing some second-level outcome. Instrumen- 
tality theory does provide a link between at- 
titudes and behaviors. The traditional model, 
on the other hand, is basically concerned 
with the development of high or low job 
satisfaction. It provides no link between at- 
titudes and behaviors. However, these two 
approaches could be easily and usefully com- 
bined. The traditional model could be used 
to predict the valence and instrumentality of 
Variable i for homogeneous subsets of work- 
ers. These predicted valences and instrumen- 
talities could then be combined to obtain 
> V;I,; (a measure of overall job satisfaction). 
Such an estimate of overall satisfaction could 
be validated. The advantage of such an ap- 
proach would be that all the variables would 
be experimentally independent of S. There- 
fore, use of such a combination would break 
the sterile confines of the ahistorical instru- 
mentality model and at the same time avoid 
the problems of response—response laws. 
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VOCATIONAL CHOICE AND PROFESSIONAL EXPERIENCE 


AS FACTORS IN OCCUPATIONAL IMAGE 


EDMOND MARKS anp SAM C. WEBB 1 


Georgia Institute of Technology 


Trait descriptions of the typical incumbent of two distinct occupational titles 
were obtained from three groups varying in professional experience: (a) fresh- 
man students beginning their occupationally relevant education, (b) senior 
students completing this education, and (c) postgraduates working in the 
occupation. In addition, self-ratings on the same traits were obtained from a 
sample of freshman students enrolled in the two majors. The self- and other- 
ratings were also related to the social desirability (SD) ratings of the set of 
trait descriptions. The results indicated that the three groups varying in pro- 
fessional experience share a common “image” of the typical occupational in- 
cumbent, with this image being substantially related to the self-characteriza- 
tions of freshmen enrolled in that major. Although SD ratings were highly 
related to the average trait characterizations, it was suggested that not all the 
differences in these latter responses could be accounted for in terms of socially 


desirable response tendencies. 


Many theories of vocational choice, adjust- 
ment, and development focus upon the no- 
tion that vocational behavior—preference, 
choice, and performance—can be explained in 
terms of the congruence between the pattern 
of attributes or traits of the individual ex- 
hibiting the behavior and the pattern of at- 
tributes or traits of some external model, for 
example, the typical incumbent of an occupa- 
tional category like lawyer, engineer, etc., or 
students of a given achievement level (Bor- 
din, 1943; O’Dowd & Beardslee, 1960; Super, 
1963; Tiedeman & O’Hara, 1963). 

In this study, the effects of two factors on 
the description of the personal characteristics 
of the typical incumbent of a representative 
occupation made by Ss listing that occupation 
as their vocational choice were examined. The 
two factors studied were occupational title— 
for example, electrical engineering or architec- 
ture—and amount of training or experience 
related to vocational preparation. In addition 
to these descriptions of some idealized or 
typical “other” made by the groups studied, 
the self-characterizations held by a sample of 
entering college freshmen who listed the oc- 
cupation title as their college major were ob- 
tained. Of particular interest was the con- 
gruence between these self-descriptions and 


1 Requests for reprints should be sent to Sam C. 
Webb, Dean, Division of Graduate Studies, Georgia 
Institute of Technology, Atlanta, Georgia 30332. 
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those of the typical incumbent of the occupa- 
tional title. 

Previous research (O’Dowd & Beardslee, 
1960) suggests that college students possess 
fairly reliable stereotypes of a wide variety of 
high-level occupations and that these stereo- 
types differ substantially among various oc- 
cupations. Changes in occupational images 
related to time or experience, on the other 
hand, have not been demonstrated. O’Dowd 
and Beardslee (1960), who compared fresh- 
man and senior images of selected high-level 
occupations, found little difference between 
these two levels of educational development. 
The failure to obtain a reliable relationship 
between experience and change in occupa- 
tional image may well be due to the interval 
studied. The greatest and-perhaps most rapid 
change may emerge after the individual 
enters the world of work. Also of interest in 
this study was the role of social desirability 
in self- and other-description. 


METHOD 
Procedure 


Eight groups of Ss were established based partly 
on the cross-classification of the occupational title 
and professional experience factors. Actually, two 
of the groups fell outside the 2 X 3 cross-classificatory 
scheme. The two occupational titles selected for 
study were Industrial Management and Electrical 
Engineering. Rather specific titles were used in an 
effort to avoid confounding of possible differences 
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among occupational incumbents all falling within 
a larger title, engineering, for example. Although a 
larger number of occupational titles were actually 
sampled, these two were selected, on the basis of 
representativeness, to simplify the analysis. 

The three levels of amount of training or experi- 
ence established were freshman, senior, and pro- 
fessional. The freshman level consisted of students 
who had not begun any formal college course work, 
but who listed either industrial management or 
electrical engineering as their choice of major. Seniors 
were those students in these two occupational titles 
who were scheduled to graduate within either of 
these majors at the next commencement. The “pro- 
fessionals” were Ss who had obtained at least a 
baccalaureate degree in one of the two occupational 
titles and who were presently working in that oc- 
cupation. 

The two final groups were freshmen—again not 
having undertaken any formal college course work— 
who listed either of the occupational titles as their 
choice of major. These Ss were asked to describe 
themselves in terms of the same trait-descriptions 
used by other groups. 

The six groups comprising the 2 X 3 classification 
defined by occupational title and professional ex- 
perience were administered a 95-item trait-descrip- 
tion instrument. The 95 items were selected to cover 
a broad range of behaviors or characteristics that a 
person might exhibit and which would be relevant 
to occupational endeavors. Some sample items are 
high intellectual ability, self-assurance, flexibility of 
thought, social immaturity, good control of impulses, 
and proneness to anxiety. The S was asked to rate 
each characteristic according to its importance in 
determining the success or failure of an individual 
working in the occupation that S had selected as 
his vocational choice. The S$ indicated his item re- 
sponse on a 9-point rating scale ranging from (1) 
“a factor of very great importance in determining 
failure,” through (5) “a factor that would have no 
bearing on an individual’s success or failure,” to 
(9) “a factor of very great importance in determin- 
ing success. 

The two groups of freshman Ss who responded 
with self-descriptions were asked to rate the same 
95 items with respect to how characteristic the given 
trait was of them. The Ss were asked to rate each 
item on a 9-point scale ranging from “very definitely 
not characteristic of me” through “neither charac- 
teristic mor not characteristic of me,” to “very 
definitely characteristic of me.” 


Analyses 


Several related analyses of the data were under- 
taken. In the ffirst analysis the scale values 
and discriminal dispersions of each of the 95 trait 
terms were obtained using Thurstone’s model for 
categorical judgment (Torgerson, 1958). The scale 
values and standard deviations were obtained sepa- 
rately for each of the eight groups by means of a 
graphical procedure for successive categories de- 
scribed by Bock and Jones (1963). Based upon these 
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scale values the trait descriptions falling in the two 
most extreme categories on both ends of the scale— 
Categories 1, 2, and 8, 9—were selected for further 
treatment. Items meeting this criterion could be 
considered as the most salient in characterizing the 
typical incumbent of the given occupation. To ex- 
plore the notion of a common occupational stereo- 
type among the four groups representing either of 
the occupational titles, the percentages of item over- 
lap, and rank order correlations—ranked on scale 
value over the common items—were computed for 
the n(n— 1)/2 pairs of groups within an occupational 
title. High values on both indexes would suggest 
that the groups within a given occupational title 
share a common image of the typical incumbent of 
that occupation. To provide an overall test of the 
hypothesis of a common occupational profile, a test 
described by Marks (1968) based on the principal 
components of the individual correlation matrices 
was employed. 


Social Desirability 


Another aspect of the present study considered 
relevant to the hypothesis of a common occupational 
image concerned the role of SD in responses to the 
trait items. The 95 trait names had been scaled by 
the method of successive categories from the re- 
sponses from an independent sample of freshmen 
students. The rank order correlations between these 
SD scale values and the scale values obtained on 
each of the four groups within an occupational title 
were calculated. Should high correlational estimates 
obtain, Ss’ “true” characterizations of an occupa- 
tional incumbent might be confounded with their 
implicit notions of what constitutes socially desirable 
traits in an idealized person. 


Subjects 


A total of 674 Ss were used in the study. The 
breakdown was as follows. For industrial manage- 
ment there were 50 freshmen who engaged in self- 
description, and 65 freshmen, 84 seniors, and 81 
professionals who rated a typical occupational incum- 
bent. For electrical engineering there were 50 fresh- 
men who engaged in self-description and 58 fresh-» 
men, 61 seniors, and 75 professionals who described 
the incumbent. In addition, a sample of 150 fresh- 
men from another university rated the 95-trait de- 
scriptions for SD. 


RESULTS 
Level of Professional Experience 


The results will be discussed first in terms 
of the professional experience dimension for 
the two occupational titles separately. 

Industrial management. For the three levels 
of professional experience—freshman, senior, 
and professional—there were 43, 47, and 46 
items, respectively, which fell into the two 
most extreme categories on the “importance” 
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TABLE 1 


SCALE VALUES AND DIsCRIMINAL DISPERSIONS® OF THE 35 COMMON ITEMS FOR THE 
INDUSTRIAL MANAGEMENT GROUPS 





Item 


1, Excellent educational training 

2. High level of physical energy 

4. Efficient working habits 

6. High intellectual ability 

. Inability to exercise authority 

. Little confidence in ability 

. Conscientiousness 

. Self-assurance 

. Self-discipline 

. Inability to work well with others 

. Tactlessness 

. Flexibility of thought 

. Unawareness of the limiting factors in a 

situation 

. Inadequate verbal skills 

. Poor deductive reasoning 

. Good inductive reasoning 

. Aspiring to high levels of professional 
achievement 

Ability to innovate 

Ability to make appropriate judgments 

Adaptability 

Inability to gain confidence of others 

Open-minded 

Lack of persuasiveness 

Lack of foresight 

Ingenuity 

Unreliable 

Emotional stability 

Perseverance 

Inability to complete assignments and 
consistently meet deadlines 

Consistent performance at top ability 

Lack of insight into behavior of others 

Good professional skills 

Industrious 

Aggressiveness 

Accepts authority 


43. 
44, 
45, 
46. 
49. 
So: 
58. 
60. 
61. 
64. 
68. 
79. 


80. 
81. 
84. 
85. 
91. 
92. 


® Discriminal values in parentheses. 


dimension; that is, they were considered de- 
scriptive of the typical occupational incum- 
bent. Of these total numbers of items, 35 
items were common to all three groups. That 
is, 81% of the freshman items, 74% of the 
senior items, and 76% of the professional 
items were shared with the other two groups. 
This suggests considerable overlap or com- 
mon perceptions of the typical industrial man- 
agement incumbent by the three groups. Of 
the items that did not match across all three 
groups, 63% of the freshman items, 83% of 
the senior items, and 63% of the professional 


Freshman Senior Professional 

1.83 (.68) 1.67 (.60) 1.75 (.65) 
1.11(.74) 1.45 (.70) 1.65 (.99) 
1.46 (.81) 1°25:675) le 7eXGSS) 
1.83 (.71) 1.25(.75) 1.60(.81) 
— .99(.99) — 1.23 (.77) — 3.00(.99) 
— 1.00(.35) —1.65(.99) — 3.00 (.99) 
1.08 (.99) 1.67 (.92) 1.80 (.99) 
1.17 (.89) 1.67 (.86) 1.67 (.60) 
1.17 (.65) 1.75 (.67) 1.88 (.61) 
—1.10(.65) —1.65(.85) — 1.76(.99) 
—1.26(.72) — 1.46 (.98) — 1.69(.82) 
1.16 (.69) 1.92(.70) 1.76 (.67) 
— 1.83 (.30) — 1.75(.55) —1.17(.98) 
— .84(.37) —1.10(.75) — 1.80(.88) 
— .84(.31) —1.18(.57) — 1.81(.55) 
1.11(.84) 1.30 (.68) 1,48 (.79) 
1.50(.77) 1.50(.80) 1.50(.68) 
1.07 (.78) 1.71 (.73) 1.85 (.88) 
1.78 (.79) 2.18 (.66) 2.50(.90) 
1,12 (.69) 1.98 (.68) 1.80(.63) 
— 1.08 (.98) — 1.73 (.99) —3.00(.97) 
1.33 (.68) 1.65 (.65) 1.37(.95) 
— .79(.40) —1.25(.58) — 1.48 (.90) 
— .93 (.30) —1.13(.70) — 1.39(.94) 
1.45 (.80) 1.46 (.66) 1.88 (.70) 
1.25 (.40) 2.24 (.66) 2.11(.81) 
1.30.80) 1.50(.35) 1.74(.85) 
1.15(.71) 1.50(.90) 1.85 (.68) 
—1.23(.44) — 1.88 (.99) — 2.50(.99) 
2.23 (.57) 2.01 (.75) 2.23 (.69) 
— .86(.50) —1.12(.63) — 2.02 (.99) 
1.47 (.47) 1.68 (.65) 1.69 (.67) 
1.70(.81) 1575157) 1.90 (.74) 
1.36(.71) 1.52(.89) 1.73 (.99) 

1.48 (.85) 1.42 (.88) 1.37 (.78) 


items were shared with at least one of the 
other groups. Eighty percent of the freshman 
single-matched items—items that were shared 
with one of the other two groups—were shared 
with the seniors, while 20% were shared with 
the professionals. Of the senior single-matched 
items, 40% were shared with the freshmen 
and 60% were shared with the professionals. 
Finally, of the professional single-matched 
items, 14% were shared with the freshmen 
and 86% were shared with the seniors. There 
appears to be a slight shift in terms of com- 
mon stereotype as one moyes from freshman 
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to senior to professional. The freshmen and 
professionals are most distinct in terms of oc- 
cupational image, while the seniors take up 
a position somewhere between the two. These 
comments are limited simply to the total num- 
ber and percentage of attributes or traits that 
the three groups agree are descriptive of the 
typical industrial management incumbent. 

Another way of examining the notion of a 
common occupational image is through the 
degree of correspondence among the scale 
values for the 35 common items. A high degree 
of correspondence would indicate that the 
ratings of importance assigned to each trait 
description are quite similar for the three 
groups. The scale values and discriminal dis- 
persions of the 35 items for the three groups 
are presented in Table 1. First, the pairwise 
rank order correlations among the 35 scale 
values were computed. The Spearman rho 
value for the freshman and senior arrays was 
.69, while the correlation estimate for the 
freshman and professional groups was .69. The 
correlation between scale values for the senior 
and professional groups was .87. These values 
complement the results of the item overlap 
analysis. Taken together, these results indi- 
cate substantial correspondence among the 
three levels of professional experience. with 
respect to occupational image or stereotype, 
although the question of a shift in image with 
experience requires an overall test which is 
now described. 

The results of the overall test (Marks, 
1968) bore out the conclusions of a common 
occupational image for the three groups. To 
apply this test, the dispersion matrix of the 
35 standardized variables was decomposed into 
between-group and within-group dispersion 
matrices for the total sample—that is, fresh- 
man, senior, and professional industrial man- 
agement groups. A principal component analy- 
sis was applied to the total sample dispersion 
matrix and the principal components ex- 
tracted. The within-group dispersion matrix of 
the principal variables was computed using 
CS,C’, where C =the matrix of charac- 
teristic vectors and S, is the within group 
dispersion matrix of standardized variables. 
CS,,C’ was then diagonalized using the Gram- 
Schmidt process. The stepwise statistics P; 
= diuw/dii—where dy is the i character- 
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TABLE 2 


STEPWISE CRITERIA AND THEIR SIGNIFICANCE LEVELS 











Pi Parameters ae ee 
984 Pion ll) 
995 226/2, 1 56 
.989 225 2a 20 
.997 224/2, 1 ail 
.996 223/201 63 





istic root of the total dispersion matrix 
and diiw is the i diagonal element of the 
diagonalized CS,,C’—were computed for the 
first five components. The remaining roots 
were quite small. The test statistics and their 
corresponding beta distribution parameters 
and significance levels are presented in 
Table 2. 

None of the test statistics reached the .05 
level of significance which indicates that the 
three groups can be considered to attribute a 
common profile with respect to shape to the 
personal characteristics of the occupational 
incumbent. 

Electrical engineering. A similar analysis 
was undertaken for the three groups rating 
the typical electrical engineer. For the three 
levels of professional experience there were 
31, 33, and 36 items, respectively, which fell 
into the two most extreme categories on 
either end of the “importance” dimension. 
Of these total numbers of items, 29 items were 
common to all three groups. That is, 94% of 
the freshman items, 88% of the senior items, 
and 81% of the professional items were 
shared with the other two groups. As with 
the industrial management groups, this sug- 
gests considerable overlap of response or a 
common description of the electrical engineer. 
Of the very few items that did not match 
across all three groups, none of the freshman 
items, 75% of the senior items, and 43% of 
the professional items were shared with at 
least one of the other groups. As is obvious, 
only the senior and professional groups shared 
any single-matched items. On the basis of 
item overlap there appears to be a strikingly 
high degree of correspondence among the 
three levels of electrical engineering Ss with 
respect to characterization of the typical in- 
cumbent of this occupation, 
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TABLE 3 


SCALE VALUES AND DiscRIMINAL DISPERSIONS® OF THE 29 ComMMON ITEMS FOR THE 
ELECTRICAL ENGINEERING GROUPS 








Item 


1. Excellent educational training 
4. Efficient working habits 
6. High intellectual ability 
15. Conscientiousness 
18. Self discipline 
26. Inquisitiveness 
32. Flexibility of thought 
34. Good mathematical skills 
37. Good mechanical comprehension 
38. Analytical skills 
39. Poor deductive reasoning 
. Good inductive reasoning 
41. Aspiring to high levels of professional 
achievement 
44, Ability to make appropriate judgments 
45. Adaptability 
49. Open-minded 
58. Lack of foresight 
60. Ingenuity 
61. Unreliable 
62. Exacting 
68. Perseverance 
75. Inability to isolate relevant feature within a 
body of data 
76. Resistant to change 
79. Inability to complete assignments and 
consistently meet deadlines 
80. Consistent performance at top ability 
83. Well read within profession 
84, Good professional skills 
85. Industrious 
93. Superior high school and college achievement 





4 Discriminal dispersions in parentheses. 


This is further borne out by the pairwise 
correlations between scale values on the 29 
common items. The scale values and dis- 
criminal dispersions of these 29 items for the 
three groups are presented in Table 3. 

The correlation estimate between freshman 
and senior scale values was .84, while the 
correlation between freshman and professional 


TABLE 4 
STEPWISE CRITERIA AND THEIR SIGNIFICANCE LEVELS 
Pj Parameter sade ee 
.984 191/2, 1 .20 
991 190/2, 1 42 
.997 189/2, 1 74 


Freshman Senior Professional 
2.65 (.60) 2.74 (.38) 2.80 (.42) 
1.70(.81) 2.11(.61) 2.43 (.60) 
1,60 (.62) 2.30(.31) 3.00 (.81) 
1.37 (.50) 1.63 (.62) 2.00 (.61) 
1,36(.46) 1.52(.61) 1.80 (.60) 
1.40(.51) 2.06 (.61) 2.40 (.57) 
1.63 (.72) 1.71(.81) 2.07 (.80) 
2.07 (.64) 2.31 (.61) 2.64(.37) 
1.43 (.47) 1,59(.61) 1.72(.55) 
1.68 (.43) 2.50(.46) 2.61 (.63) 

—1.10(.42) —2.01(.48) —2.17(.50) 
1.42(.50) 1.61(.81) 1,96 (.80) 
1.45 (.80) 1,42 (.57) 1.40(.58) 
1.41 (.79) 1.46 (.62) 1.61 (.60) 
1.55(.90) 1.39 (.61) 1.46 (.60) 
1.23 (.81) 1,36 (.72) 1.56(.70) 

—1,00(.61) — 1.01 (.68) — 1,08 (.63) 
1.73 (.45) 2.15 (.50) 2.91 (.48) 

— 1.38 (.36) —2.36(.40) — 2.39(.50) 
1.63 (.28) 1.91 (.62) 2.16 (.48) 
1.35 (.46) 1.50(.81) 1.68 (.62) 

—1.01(.69) — 1.09 (.68) — 2.01 (.63) 

— 1.07 (.80) —1.19(.58) —1.22(.80) 

— 1.32(.39) — 1.62 (.68) —1.52(.61) 
2.21 (.32) 1.80(.62) 1.91 (.70) 
1.36 (.68) 1.84 (.60) 2.31(.69) 
1.77(.61) 1.98 (.54) 2.53 (.67) 
1.65 (.55) 1.70(.57) 1.86(.55) 
1.70(.41) 2.01 (.51) 2.24 (.57) 





scale values was .79. The correlation between 
senior and professional values was .98. Again, 
these values complement the item overlap 
analysis by showing considerable correspon- 
dence between the rating of trait descriptions 
for the groups taken in pairs. The overall test, 
as described in the industrial management re- 
sults, was applied to the three electrical engi- 
neering groups. The stepwise test statistics, 
and their beta distribution parameters and 
significance values are presented in Table 4. 
Only three characteristic roots were large 
enough to be considered. 

As with the industrial management groups, 
there is no basis for rejecting the hypothesis 
of a common profile for the three electrical 
engineering groups. Taken together with the 


a i i 
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item overlap and correlation data, these re- 
sults indicate a similar occupational image 
for the groups varying in professional ex- 
perience. 

Self-description. Since the tests on the hy- 
potheses relating to professional experience 
supported the notion of a common occupa- 
tional image, the data for the three groups 
within an occupational title were pooled and 
the scale values of the 95 items for the two 
freshman self-descriptions were correlated with 
the scale values for the two pooled groups. 

The Spearman rho between self-descrip- 
tions of the industrial management freshmen 
and the scale values for the pooled ratings of 
the typical occupational incumbent by the 
three industrial management groups varying 
in professional experience was .69. The cor- 
responding value for the electrical engineer- 
ing groups was .72. These values may not be 
comparable to the rho values reported previ- 
ously for the two occupational titles. The 
latter values were based only on items com- 
mon to all three groups—that is, items fall- 
ing on the extremes of the “importance” 
dimension. Some attenuation of correlation 
estimates using such extreme categories would 
be expected. Nonetheless, the present rho 
values indicate substantial correspondence be- 
tween the way a freshman student “sees” or 
describes himself and the common image 
ascribed by Ss at varying levels of profes- 
sional experience to the typical occupational 
incumbent. 

For the most part, the discriminal disper- 
sions of the freshman self-descriptions tended 
to be larger than the same values for the 
three “experience” groups. The average ab- 
solute difference between these discriminal 
dispersions over the 95 items was .12. Only 
for 6 items was the freshman self-description 
discriminal dispersion less than the corre- 
sponding value for the “experience” groups. 
As such, self-descriptions demonstrated greater 
variability than other-descriptions. 


Occupational Title 


Since there were no significant differences 
among the three groups within an occupa- 
tional title varying in professional experience, 
the data within an occupational title were 
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pooled and comparisons between the two 
titles were made on these common images. 

In terms of item-overlap, of the 35 indus- 
trial management and 29 electrical engineer- 
ing items falling into the previously defined 
extreme categories of the importance dimen- 
sion, 20 items were common to both groups. 
That is, 57% of the trait-descriptions con- 
sidered salient by the industrial management 
Ss and 69% of such items for the electrical 
engineers were shared by the groups. This is 
somewhat, but not much, less than the per- 
centage overlap reported for the professional 
experience factor. 

Although there is a moderate amount of 
overlap in the traits used to describe the 
typical occupational incumbent by the two 
groups, the pattern of judged importance of 
these common traits is quite different for the 
groups. The Spearman rho between scale 
values over the 20 common items was only 
.20. For all 95 items the same index was still 
only .32. Both values are quite low and sug- 
gest that the vocational images of the two 
titles are noticeably distinct. 

In examining the trait descriptions that did 
not match, it was noted that the industrial 
management occupational image was more 
sensitive to and included trait descriptions 
involving interpersonal skills and traits, for 
example, inability to work well with others, 
emotional stability, and self-assurance. It ap- 
peared that the industrial management image 
was more complex in that it included the per- 
sonality and interpersonal domains as well 
as the intellective, training, and performance 
domains found in the electrical engineering 
image. 


Social Desirability 


The last aspect of the study investigated 
the relationship of the ratings assigned to the 
trait descriptions for the two occupational 
titles to the independent judgments of the 
social desirability of these traits. Several 
methods for examining this relationship have 
been proposed (Edwards, 1957), and the 
problems in interpretation of such correlation 
estimates have been discussed by Norman 
(1967). In this study, the item SD scale 
values (based upon judgments of 150 Ss) 
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were correlated over the 95 items with the 
item scale values of both the industrial man- 
agement and electrical engineering groups 
separately. Most of the criticism leveled 
against this index by Norman (1967) is 
avoided by interpreting these correlations as 
reflecting properties of the items only, and 
not as involving S variability. 

The Spearman rho’s were .63 for industrial 
management and .43 for electrical engineer- 
ing. Thus, a substantial relationship obtains 
between SD and “person” ratings for the in- 
dustrial management title, and to a lesser 
extent for the electrical engineering group. 

In terms of self-description, the rho values 
for the SD and freshman self-rating arrays— 
both industrial management and electrical en- 
gineering—were .64 and .61, respectively. 
These values were not noticeably different 
from those involving “other-description.” 
Whatever the role of SD and self- and other- 
description, it appears to emerge equally in 
both types of tasks. Some possibilities re- 
lating to this role are discussed later. 


DISCUSSION 


The results obtained here are similar to 
those obtained from previous studies where 
different and broader occupational titles were 
used and shorter time intervals, and perhaps 
less qualitatively different levels of experience, 
were examined (O’Dowd & Beardslee, 1960). 
For one thing, amount of professional experi- 
ence appears to have little effect upon an in- 
dividual’s image of the incumbent of an oc- 
cupation for which the individual is in train- 
ing or has been trained—at least for the 
domain of traits and its elements sampled in 
this study. In particular, the notion that oc- 
cupational images converge over time or with 
experience to some sort of norm—which in 
this study was assumed to be the image held 
by the professional group—appears untenable. 
The number and kinds of traits and their 
patterning which characterize the average 
freshman’s occupational image, regardless of 
title, are not much different, at least statisti- 
cally, from those which characterize the pro- 
fessional’s image. On this basis it seems rea- 
sonable to conclude that the average college 
freshman beginning his occupationally relevant 
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education, his college major, possesses a fairly 
accurate image—assuming the professionals 
know what they are talking about—of the 
typical incumbent of the intended occupation. 

Despite some overlap, the common images 
of the two occupational titles studied were 
noticeably different in certain respects. First, 
the industrial management image appeared 
more extensive and complex than that of elec- 
trical engineering. The industrial management 
image, aside from including the ability, train- 
ing, and performance factors found in the 
electrical engineering image, stressed the im- 
portance of the personality and interpersonal 
characteristics of the individual. To perform 
successfully in this area the individual must 
show personality traits and interpersonal 
skills which would permit him to deal ef- 
fectively with and perhaps influence other 
people. He must be sensitive to the motives 
or needs of others, accurately interpret their 
behavior, and exhibit self-behaviors that do 
not conflict with his evaluation of these mo- 
tives and behaviors of others. Interestingly 
enough, guile or duplicity as an interpersonal 
strategy—as found in the so-called Machiavel- 
lian personality—is not rejected, nor is it ac- 
cepted, as part of the industrial management 
image. As would be expected with such an 
interpersonal orientation, verbal skills were 
highly valued. 

Second, the pattern of importance of traits 
common to both groups was different. The 
electrical engineering groups assigned higher 
importance to intellective and cognitive skills 
and their utilization, for example, educational 
training, ingenuity, inquisitiveness, and flexi- 
bility of thought, than did the industrial man- 
agement Ss. Again, this reflects a tendency for 
the electrical engineers to be oriented more 
toward individual or problem-solving activities 
than the industrial management groups. For 
the electrical engineers it is sufficient to be 
bright, well-schooled, ingenious, and persistent 
in one’s approach to problems. Any success 
accruing to the individual will come from 
these characteristics rather than from the 
interpersonal sphere. 

As to self-image and its relation to occupa- 
tional image, on the basis of the large dis- 
criminal dispersions, considerable individual} 
differences in self-description were noted. It ig 
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likely that the patterns or profiles of these 
self-characterizations for each person also 
show considerable individual differences. If 
so, it may be possible to break down the total 
sample of Ss engaging in self-rating into sub- 
sets of Ss internally homogeneous with respect 
to profile, that is, sets which represent dis- 
tinct types of people. Despite these large ob- 
served and hypothesized individual differences, 
the correlation values obtained in this study 
indicate that the freshman self-descriptions 
are similar to the common images of the oc- 
cupational titles. It may be that this cor- 
respondence between self and occupational 
images is due to common variance attributable 
to social desirability, a point considered next. 
In any case, there appear to be marked differ- 
ences between the ways college freshmen en- 
rolling in electrical engineering and industrial 
management view and describe themselves, 
with these two self-images being highly cor- 
related with the corresponding occupational 
images. Because of these differential correla- 
tions it is unlikely that all the correspondence 
between self- and occupational-images can be 
accounted for in terms of SD. 

The significant correlations between both 
SD ratings and self-descriptions, and SD 
ratings and occupational descriptions, indicate 
that the kind and pattern of traits considered 
descriptive of self and other in this case cor- 
respond to what Ss also consider to be de- 
sirable behavior in general. The work of Jack- 
son and Singer (1967) suggests that the no- 
tion of SD is indeed a complex one being in- 
fluenced—among other things—by content of 
the items, sex of Ss, and individual differences. 
More importantly, they have demonstrated 
SD to be a multidimensional construct. These 
results again indicate the need for considering 
individual points of view or types in the study 
of self- or other-descriptions, and, perhaps, 
the inappropriateness of averaging SD ratings 
over a sample of Ss (Tucker & Messick, 
1963). In this study, however, interest and, 
thus, conclusions were limited to the gross as- 
sociation between the item parameter of SD 
and person ratings. It is apparent from the 
results that much of the self- and other-charac- 
terizations studied can be described in terms 
of gross judgments of socially desirable be- 
haviors. That this finding is not adequate for 
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completely explaining self and occupational 
images has been noted. A reasonable conclu- 
sion in this respect is that S’s perceptions of 
himself and the typical incumbent of a se- 
lected occupation frequently reflect socially 
desirable behaviors, but that it is unlikely 
that these perceptions are solely determined 
by socially desirable response tendencies. 

An important question originally set out 
to be answered concerned the extent to which 
occupational and self-images are related to 
vocational behaviors, such as_ vocational 
choice, satisfaction, or performance. On the 
basis of the present results the answer seems 
to be “not much.” For one thing, occupa- 
tional images apparently form quite early, 
probably in high school or earlier, and are 
accurate in the sense of matching the profes- 
sional’s images. Since the students’ average 
self-image tends also to coincide with the 
occupational norm of the major in which they 
are enrolled, if the self-concept model is ac- 
curate, little dissatisfaction on the part of the 
students with a current vocational choice, 
little attraction to another vocational area, or 
little vocational uncertainty can be expected. 
Unfortunately, our knowledge relating to col- 
lege change of major forces us to reject this 
expectation (Marks, Webb, & Strickland, 
1967). 

One possible factor in such observed dis- 
enchantment with or change in vocational 
preference or choice among college students 
is the notion that the students themselves are 
changing or, in terms of the present develop- 
ment, that their self-image is changing (Plant, 
1962). Since the occupational images of a 
given student probably remain fixed, this. 
hypothesized change in self-image implies a 
reordering of occupational preferences by the 
individual thus leading to a change in his 
choice of vocation and major. Under this 
interpretation the self-concept model is con- 
sistent and can account for vocational be- 
havior and choice. 

An alternative position is that trait de- 
scriptions, either self- or other-, are not 
specific enough to permit adequate prediction 
of vocational preferences or behaviors. For 
example, simply describing a person as bright, 
aggressive, emotionally stable, etc., may not 
be sufficiently precise to yield sensitive or 
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accurate predictions of that individual’s voca- 
tional behavior. Although self- and other- 
images, as studied here, might be valuable 
adjuncts, the study of vocational preference, 
choice, and behavior might profitably be ex- 
tended to include rather specific vocational or 
job activities. Here a college student might 
also be asked to express his knowledge of 
specific job characteristics, activities, or tasks, 
for example, what does the typical occupa- 
tional incumbent do both on and off the job. 
These approaches—the self-concept model and 
knowledge of specific job activities and be- 
haviors—when taken together, should provide 
a more rigorous examination and prediction 
of occupational criteria. 
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COLOR CODING EFFECTS IN COMPATIBLE AND 
NONCOMPATIBLE DISPLAY-CONTROL 
ARRANGEMENTS 


GARY K. POOCK 1 


Naval Postgraduate School, Monterey, California 


An experiment was designed to assess the effect of color coding in compatible 
and noncompatible display-control arrangements. Forty Ss viewed four ar- 
rangements of display-control panels with color coding or no color coding, and 
with displays and controls arranged in either a compatible or noncompatible 
arrangement. The Ss’ task was to shut off the display as fast as possible for 80 
trials. Color coding was more effective when displays and controls were ar- 
ranged in a noncompatible fashion, and had no effect when display and 
control were arranged in a compatible manner. The results support the im- 
portance of compatibility in display-control location. 


Previous researchers have investigated the 
effects of color coding in various types of 
visual displays. Green and Anderson (1956) 
investigated color coding in a visual search 
task while Jones (1962) provided a survey of 
previous research by numerous authors and 
discussed the various parameters important in 
color coding. Smith (1962, 1963), Smith and 
Thomas (1964), and Smith, Farquhar, and 
Thomas (1965) have reported a series of in- 
vestigations involving color coding in in- 
formation displays. Their research, along with 
that of other researchers, indicates potential 
benefits of color coding depending on its use 
and the type of display. In another research 
study by Chapanis and Lockhead (1965), the 
effectiveness of sensor lines showing linkages 
between displays and controls was tested to 
find out if lines drawn from the control to 
the display provided any difference in time 
to respond to a display light on display panels 
up to 10” X 12”. They investigated various 
panel arrangements using the sensor lines and 
using compatible versus noncompatible ar- 
rangements of display and control. Their re- 
sults suggested it is more important to make 
the relative location of displays and controls 
compatible than it is to use lines showing 
which control relates to which display. When 


1 The author expresses his appreciation to Dennis 
R. Jordan, LCDR, USN, and Gerald A. Vick, Major, 
USA, for their help in procurement of the apparatus 
and data. Requests for reprints should be sent to the 
author, Department of Operations Analysis, Naval 
Postgraduate School, Monterey, California 93940. 


the displays were compatible, sensor lines con- 
tributed nothing to speed of response. 

Recognizing the potential uses of color cod- 
ing, the purpose of this research was to in- 
vestigate the use of color coding in com- 
patible and noncompatible arrangements of 
displays and controls. The specific objective 
was to determine if color coding would pro- 
vide different times to control response, espe- 
cially in a noncompatible display-control ar- 
rangement. 


MeETHOpD 


Forty Army, Navy, and Marine male officers with 
normal color vision served as experimental Ss. 
The Ss were divided into four groups of 10 each 
and were tested for 80 trials in one of four display- 
control arrangements. A partially nested analysis of 
variance was used, with Ss nested in one of the four 
display-control arrangements but common to the 80 
trials. The compatible and noncompatible arrange- 
ments with color coding are shown in Figure 1. 
The panels were 12” X 24”. The displays were stan- 
dard red, green, blue, and white Christmas tree’ 
lights and the controls were standard light switches 
with mounting plates located directly beneath the 
lights in a corresponding fashion. The null position 
of all switches was toward S. 

The compatible arrangement with color coding is 
shown on the left side of Figure 1. The compatible 
arrangement with no color coding was exactly the 
same except all lights were red and all switches and 
mounting plates were white. However, the upper 
left-hand light was still controlled by the upper 
left-hand switch and likewise for the other three 
display controls. In the noncompatible color coded 
arrangement shown on the right side of Figure 1, a 
colored switch still controlled its respective colored 
light, but the location of displays and controls was 
not compatible. The noncompatible arrangement with 
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GREEN WHITE 


(1) COMPATIBLE DISPLAY @ NON-COMPATIBLE DISPLAY 


Frc. 1. Color coded compatible and noncompatible 
displays used. (Lettering is included for easier in- 
terpretation and did not actually appear on displays.) 


no color code again used red lights and white 
switches, but S had to remember which switch con- 
trolled which light. The display-control arrangement 
was the same as the noncompatible with color. 

At the start of a test session, E demonstrated the 
use of the display-control panel an S was to use. 
Standardized instructions informed each S that he 
was to start each trial with his hand on the black 
marker at the bottom center of the control panel. 
The S’s task would be to shut off a particular light as 
fast as he could by pushing the corresponding switch 
away from him. If he pushed an incorrect switch, 
the light would stay on until the correct switch was 
activated. After 12 practice trials and any questions, 
the testing period began. Each light occurred ran- 
domly an equal number of times in each half of the 
trials. Time to correct response was measured in 
tenths of a second. This parameter had been sug- 
gested by Chapanis and Lockhead (1965) as being 
more meaningful operationally, even though it may 
be more complex psychologically than time to first 
response. 


RESULTS 


Figure 2 shows the average time to correct 
response for each of the panel combinations 
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Fic. 2. Average times to correct response. (Each 
point is average of 100 data points.) 
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used. The graph is very similar to the cor- 
responding one of Chapanis and Lockhead 
except the current data provided uniformly 
higher response times because of the different 
controls used. 

It was desired to examine the effects of 
color coding and display-control arrangement 
only after initial learning was apparently 
over. An overall analysis of variance indicated 
a significant difference (p < .005) between 
the first and last half of the 80 trials. This 
result and observation of the data indicated a 
significant practice effect had occurred. How- 
ever, further analysis on the second half of 
the trials indicated no significant improve- 
ment in response times. Thus Table 1 shows 
the analysis of variance on the last 40 trials, 
and should more reasonably indicate what 
significant effects remain in the data after the 
first 40 trials of performance. Therefore, the 
remainder of the paper will be concerned with 
Table 1. 

The most outstanding result of this in- 
vestigation was the effect of the combination 
of color coding and panel arrangement where 
the very marked beneficial effect of color 
coding on a spatially noncompatible panel 
was demonstrated. The reader’s attention is 
directed to the top two curves of Figure 2, 
where the data of the last 40 trials indicate 
an improvement in response times of 40— 
50% when color coding is used on noncom- 
patible panel arrangements. This result pro- 


TABLE 1 


ANALYSIS OF VARIANCE ON SECOND 
HALVES OF THE DATA 
(Tr1ats 41 THROUGH 80) 

















Sources of variation df MS F 
Between Ss (S) 39 309.15 | — 
Between panels (P) 3. | 2729.59 | 25.40* 
Compatible vs. noncom- 
patible (C) 1 | 4613.80 | 42.94* 
Color code vs. no color 
code (Col) 1 | 1740.97 | 16.20* 
Interaction: C X Col fl 1833.99 | 17.07* 
Between Ss within panels 
(S w/P) | 36 107.45 | — 
Between trials within Ss 1560 8.97 | — 
Total 1599 
* p> <.001. 


Cotor CopInGc EFFECTS 


vides strong support for the use of color 
coding when displays and controls cannot be 
arranged in a spatially compatible relation- 
ship. Color coding had no effect on response 
times when the displays were compatible, as 
seen in the two bottom curves in Figure 2. 
Further, color coding did not improve re- 
sponse levels on noncompatible displays to 
those of compatible displays. 

There was a significant difference between 
compatible and noncompatible displays with 
the compatible displays providing the faster 
response times. The significance of the effect 
of compatibility is supported by further 
analysis which showed approximately 55% 
of the variance between panels was due to 
the compatibility of the panel arrangement. 
This strong effect of compatibility is in agree- 
ment with Chapanis and Lockhead where 
they showed 77% of the variance between 
panels was due to the compatible-noncom- 
patible effect. 

The main effect of color coding was also 
significant and did account for more of the 
total variance between panels than the sensor 
lines of Chapanis and Lockhead. Approxi- 
mately 20% of the variance between panels 
was due to color coding in this experiment 
versus approximately 6% due to sensor lines 
in their experiment. 

Finally, further analysis using Duncan’s 
multiple range test (Winer, 1962) for pair- 
wise differences among the four panels showed 
no significant difference between the com- 
patible panels at the .05 level. All other pair- 
wise differences between the panels were sig- 
nificant (p < .05). 
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DIscussIoNn 


The compatibility of the display and con- 
trol with respect to relative spatial location is 
the most important point revealed in this in- 
vestigation and confirms that same conclusion 
of Chapanis and Lockhead (1965). Color 
coding appears to have a proportionally more 
significant effect than the use of sensor lines. 

The conclusions from this investigation sug- 
gest, in view of the panels used here, that one 
should first concentrate on compatibility with 
respect to display-control location. However, 
if this is not possible, the use of color coding 
will improve significantly the time to correct 
response in a noncompatible display. 
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A study was conducted to relate interest patterns and consumer product 
preferences. The SVIB and a product preference questionnaire were adminis- 
tered to a sample of 239 male business students. Significant interest differences 
were found between Ss preferring a convertible automobile to a sedan auto- 
mobile, a trip to Yellowstone to a trip to Las Vegas, and a savings account to 
common stock. These differences were aligned on a people—nonpeople interest 
axis, with the latter product in each pair being associated with relatively more 
people-oriented interest patterns, and suggest that meaningful interest-product 


preference relationships may exist. 


Numerous attempts have been made to re- 
late various psychological concepts to con- 
sumer purchasing patterns. Examples are the 
studies of Wells (1961) in predicting con- 
sumer behavior by the use of attitudes; Krug- 
man and Hartley (1960) in the learning of 
tastes; Tucker and Painter (1961) and West- 
fall (1962) in correlating personality factors 
with product preferences and usage rates; and 
Losciuto and Perloff (1967) in relating cogni- 
tive dissonance to consumer preferences. 

However, no attempts have been made to 
relate interest patterns to consumer product 
preferences. This is surprising since interests 
are viewed as “complex psychological struc- 
tures controlling choices of the use one will 
make of his time” (Tyler, 1965, p. 206); 
hence, interest patterns should be related to 
decisions to purchase several types of con- 
sumer products. 

To investigate these relationships, two pilot 
studies were undertaken using the Strong 
Vocational Interest Blank (SVIB) and a 
product preference questionnaire. These stud- 
ies revealed that scores on SVIB physical 
science scales (Group II), technical and 
skilled trade scales (Group IV), social service 
scales (Group V), and business and sales 
scales (Groups VIII and IX) differentiated 
between respondents who preferred certain 
products presented in a forced-choice situa- 
tion. 

The primary purpose of the present study 
was to replicate these pilot studies and cross- 


1 Requests for reprints should be sent to Robert A. 
Peterson, School of Business Administration, Uni- 
versity of Minnesota, Minneapolis, Minnesota 55455. 
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validate preliminary findings with a similar 
and larger sample. A second purpose was to 
suggest the potential of previously unexplored 
psychological concepts as determinants of 
consumer behavior. 


MerETHOD 
Subjects 


The sample consisted of 239 male students en- 
rolled in the introductory marketing course at the 
University of Minnesota. The mean age was 21.8 
yr. with an SD of 1.5 yr. 


Instruments 


The instruments used were the SVIB and a product 
preference questionnaire. The SVIB was used be- 
cause it has consistently proved to be valid and re- 
liable in vocational counseling and personnel selec- 
tion for over four decades (Campbell, 1966). Fur- 
thermore, it has been successfully applied in several 
other contexts (Campbell & Johansson, 1966; Knapp, 
1964; Thorndike, Weiss, & Dawis, 1968; Whitehorn 
& Betz, 1960). 

The product questionnaire consisted of structured 
questions concerning product preferences. Products 
were presented in pairs on a 5-point scale ranging 
from strongly preferring Product A to strongly pre- 


TABLE 1 


Propuct ALTERNATIVES AND NUMBER OF SUBJECTS 
PREFERRING Eacu PRropuct IN A PAIR 








Product pair Number of Ss preferring 


Savings account 73 
Common stock 144 
Convertible automobile 104 
Sedan automobile 106 
Yellowstone trip 65 
Las Vegas trip 135 
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TABLE 2 
SVIB Means AND STANDARD DEVIATIONS FOR SAVINGS ACCOUNT PREFERRERS (7 = 73) | 
AND COMMON STOCK PREFERRERS (n = 144) 
Savings preferrers Stock preferrers 
Scale Di, 
M aD M SD 
Group IT Architect 22.0 9:2 18.5 oD 3.0" 
Mathematician 13.6 7.9 10.2 9.0 3,4** 
Physicist 12.8 9.5 9.6 10.8 3.2* 
Chemist 19.3 11.6 17.0 13.3 2S 
Engineer Papi 10.7 20.3 ea 2.4 
Group IV Carpenter 21.3 HES 16.8 1202 4,5** 
Forest service man 20.0 12.8 19.4 11.9 6 
Farmer 29.9 10.2 28.1 9.4 1.8 
Math-science teacher 30.7 10.4 29.6 aie: te 
Printer 34.1 9.5 30.0 9.8 4,1** 
Policeman 22.3 9.3 20.7 9.4 1.6 
Group V_ Personnel manager 30.3 9.3 35.1 10.6 A On 
Public administrator 34.8 9.0 38.4 10.7 —3.6* 
Rehabilitation counselor 32.8 10.1 34.0 10.4 —1.2 
YMCA secretary 33.8 ied 37.6 12.4 —3.8* 
Community recreation director 34.0 10.7 37.5 12.4 —3,5* 
Social worker 28.0 12.8 30.9 12.8 —2.9 
School superintendent 19:2 9.9 20.9 et +17 
Minister 12.4 14.0 12.1 13.7 3 
Group IX Sales manager 30:0 10.5 39.0 11.4 —3./* 
Real estate salesman 39.2 8.7 40.7 8.5 —1.3 
Life insurance salesman Giles, 9.9 34.6 10.7 —2.9 
Chamber of commerce executive 39.2 8.6 44,2 10.4 —5,0*** 
Credit manager 40.9 9.5 44.9 9.5 —4.0** 
Occupation introversion-extraversion 44,7 11.2 39.7 11.0 SO 
Occupational level 55.4 6.4 58.2 8.1 —2.8* 
President, manufacturing firm 26.4 8.5 29.5 9.5 —3.1* 
Adventure®* 60.6 9.9 63.9 9.1 —3.3* 
Public speaking* 52.1 10.0 55.7 10.0 —3.6* 








® Basic scales, 
*p <.05. 
HD <.01. 
*** > < 001. 


 ferring Product B. A sixth category was provided for 
- “prefer neither” responses. Criteria used in selecting 
_ product pairs were (a) preferences for particular 
products were related to specific interest patterns in 
the pilot studies; (b) product pairs represented a 
variety of consumer alternatives—travel, investment, 
- a major durable good purchase; and (c) the products 
were feasible alternatives and similar in price. Unde- 

sirable product pairs were eliminated by pretests; 
- only those meeting the above criteria were included 
in this study. 


Procedure 


The SVIB and product questionnaire were ad- 
ministered as unrelated instruments in a single test- 
ing session; thus, students were led to believe they 


were participating in two separate studies. A sub- 
sequent check indicated that the participants did not 
associate the two instruments. 

Three pairs of product alternatives, listed in Table 
1, were investigated by comparing mean SVIB scores 
of Ss strongly or moderately preferring one product 
with the mean scores of those Ss preferring the other. 
“Indifferent” and “prefer neither” categories were 
excluded from the analysis. Table 1 also presents the 
number of Ss strongly or moderately preferring each 
alternative. 


RESULTS 


Tables 2, 3, and 4, respectively, report se- 
lected SVIB scale means and standard devia- 
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tions for the analyzed pairs of products. In 
general, the results of the present study con- 
firmed those of the pilot studies. Specifically: 

(1) The Ss preferring savings accounts ob- 
tained higher scores on SVIB physical science 
and technical-skilled trade scales (Groups II 
and IV) and lower on social service, business, 
and sales scales (Groups V, VIII, and IX) 
than Ss preferring common stock. 


(2) The Ss preferring a trip to Yellowstone 
scored higher on technical-skilled trade scales 


SVIB MEANS AND STANDARD DEVIATIONS FOR CONVERTIBLE PREFERRERS (7 = 
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(Group IV) and lower on sales and verbal- 
linguistic scales (Groups IX and X) than Ss 
preferring a trip to Las Vegas. 

(3) The Ss preferring a convertible auto- 
mobile obtained higher mean scores on social 
service and sales scales (Groups V and IX) 
and lower scores on physical science and tech- 
nical-skilled trade scales (Groups II and IV) 
than Ss preferring a sedan automobile. 

Only those SVIB scale groups differentiat- 
ing between subsamples in the pilot studies 


TABLE 3 


104) 


AND SEDAN PREFERRERS (n = 106) 





























Convertible preferrers Sedan preferrers 
Scale De 
M SD M SD 
Group II Architect 17.2 9.7 21.4 8.2 — Ante 
Mathematician 8.0 8.2 14.0 8.3 —6.0*** 
Physicist 7.4 10.2 orl 10.1 ena 
Chemist 14.9 12.9 19.6 12.0 —4,7** 
Engineer 18.4 11.5 230i 10.3 — 3 hae 
Group IV Carpenter 15.6 12.0 Zale 11.9 eee 
Forest service man 19.0 12.0 20.6 — 12.0 —1.6 
Farmer 26.6 10.0 Se? 9.3 —4.6*** 
Math-science teacher 29.1 10.2 30.5 Lae. —1.4 
Printer 29.9 10.1 Sow 9.6 —3.3* 
Policeman 21.9 9.3 21.0 9.3 9 
Group V_ Personnel director 34.5 ie 33.2 10.4 1.3 
Public administrator 38.1 10.7 36.3 10.2 1.8 
Rehabilitation counselor 55D 10.7 31.6 9.7 3.9%" 
YMCA secretary 40.3 12.4 32.3 10.7 8.08%" 
Social worker SA! 13.2 27.3 12.4 5.0%" 
Social science teacher 36.1 11.0 32.9 9.5 3:28 
School superintendent eAled 10.9 19.6 10.5 1.6 
Minister 12.9 13.9 10.7 14.0 Dez 
Group IX Sales manager 40.3 10.0 36.6 11.2 ai 
Real estate salesman 41.5 7.9 39.5 8.4 2.0 
Life insurance salesman 35.8 9.9 31.9 10.0 3.0% 
Biologist 10.0 9.3 14.5 11.0 —4,5** 
Chamber of commerce executive 45.2 10.2 39.9 9.3 5 ote 
Community recreation director 39.7 12.0 32.8 10.6 6.9*** 
Occupation introversion-extraversion 38.4 A1c3 44.8 10.7 Oven 
Sales® 59.3 8.6 55.7 pot) 3.6** 
Adventure®* 64.9 8.1 60.6 10.1 Aone 
Recreational leadership® 56.9 8.1 2:8, 9.2 4.6%** 
Public speaking* Das, 9.9 So.0 9.9 2s 
Merchandising® 63.1 6.7 60.5 fe 2.6** 
8 Basic scales. 
*p <.05. 
** > < 01. 


rE > < 001. 
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TABLE 4 
SVIB Means AnD STANDARD DEVIATIONS FOR TRIP TO YELLOWSTONE PREFERRERS (” = 65) 
AND Trip TO LAS VEGAS PREFERRERS (1 = 135) 
Yellowstone preferrers Las Vegas preferrers 
Scale De 
M SD M SD 
Group IV Carpenter 21.8 ele 16.7 12.4 eae 
Forest service man 24.4 122 173) 115 pees 
Farmer Bile 9.8 27.6 9.8 3a” 
Math-science teacher 32.9 10.8 28.4 11.0 4.5** 
Printer 2a 10.6 31.5 9.8 6 
Policeman Dor) 9.3 20.3 9.1 Sp? 
Group IX Sales manager 34.3 dee 40.2 10.4 =15:9*F* 
Real estate salesman 38.0 8.7 41.7 8.2 —3,7** 
Life insurance salesman 30.5 10.2 35.8 9.9 —5.3*** 
Group X Advertising man 28.1 10.0 33.6 10.3 One n 
Lawyer 26.4 8.1 29.2 8.4 —2.8* 
Author-journalist pal 6.5 28.2 7.4 —3,1** 
Biologist 15.8 10.4 10.3 10.0 Dnt 
Chamber of commerce executive 39.7 9.2 44,2 10.1 —4,5** 
Liberalism-conservatism 41.3 9.3 45.4 9.3 —4,1** 
President, manufacturing firm 26.8 9.1 29.4 8.6 —2.6* 
State department interpreter 25.1 11.6 Sle iil —6.1*** 
Mechanical*® 49.6 9.3 44.9 9.1 As [ate 
Nature*® 46.5 10.5 39.2 8.7 Veo 
Adventure* 60.4 10.5 64.1 8.9 —3.7* 
Recreational leadership* 52.9 9.1 BG 7 8.3 —2.8* 


® Basic scales. 
m -05. 


are reported in the tables. However, all of the 
individual scales within the reported groups 
are presented to illustrate the consistency of 
the results. 

In addition, selected Occupational, Nonoc- 
cupational, and Basic Interest scales (Camp- 
bell et al., 1968) are presented at the bot- 
tom of each table. These supplemented the 
preceding grouped scales and assisted in the 
interpretation of the results. 


DIscussION AND CONCLUSIONS 


A content analysis of the interests of re- 
spondents preferring different alternatives sug- 
gest these interests can be associated with a 
people-nonpeople interest continuum. The Ss 
who preferred savings accounts, trip to Yel- 
lowstone, and sedan automobile reported more 
nonpeople interests, while Ss who preferred 





common stock, trip to Las Vegas, and con- 
vertible reported more people-oriented in- 
terests. The supplemental scales reported at 
the bottom of Tables 1-3 support this con- 
tention. Particularly noteworthy are the re- 
spective scores on Occupational introversion- 
extraversion, a direct measure of people 
interests (lower scores) versus nonpeople in- 
terests (higher scores), and the Public speak- 
ing basic scale. In two of the product pairs, 
respondents on the people-end of the interest 
continuum scored significantly lower on the 
former scale and higher on the latter scale, 
and in the third pair the mean differences were 
in the expected directions. 

Moreover, the supplemental scales lend face 
validity to the study findings. For instance, 
Ss who preferred savings accounts scored sig- 
nificantly lower on the Occupational level 
scale than Ss who preferred common stock, 
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These scores illustrate the lower socioeco- 
nomic interests of Ss who prefer savings ac- 
counts and are consistent with the research 
findings of Katona (1964). Additionally, the 
largest mean score difference between Ss who 
preferred a trip to Yellowstone and those who 
preferred a trip to Las Vegas was on, ap- 
propriately, the Nature scale. Finally, Ss pre- 
ferring common stock, the Las Vegas trip, 
and convertible automobile obtained signi- 
ficantly higher scores on the Adventure scale. 
The differences on this scale were expected 
from the nature of the product alternatives 
and add credence to the concept of an interest- 
consumer preference relationship. 

Although the results must be interpreted 
with respect to the sample employed, the in- 
terdependence of the interest score differences, 
and the specific products used, they neverthe- 
less suggest that meaningful relationships may 
exist between interest patterns and product 
preferences. In fact, the interest homogeneity 
of the sample and the methodology employed 
probably increase the practical significance 
of the results. 

Given that these results can be replicated 
across random consumer samples with actual 
purchases as the criteria, interest differences 
may prove to be much greater. If this is so, 
interest patterns may ultimately become im- 
portant variables in the explanation and pre- 
diction of purchasing behavior. Further re- 
search should focus upon two areas: (a) In- 
terests should be investigated with respect to 
the amount of variance they account for in 
actual purchase decisions compared with other 
purchase determinants, and (0) attempts 
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should be made to analyze similarities of in- 
terest patterns across product categories. 
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EFFECTIVENESS OF SVIB ACADEMIC INTEREST SCALES 
IN PREDICTING COLLEGE ACHIEVEMENT * 
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The effectiveness of various SVIB academic interest scales in predicting first 
semester grades for freshman males at the University of Massachusetts was 
determined. Both the Rust and Ryan and the Campbell and Johansson scales 
contributed significantly, albeit modestly, to a multiple correlation coefficient 
consisting of high school rank and SAT scores in predicting academic per- 
formance. A single-item, self-evaluation rating scale failed to predict grade 
point average significantly. Although the degree of relationship between the 
interest scales and grades tended to be somewhat greater for “marginal” stu- 
dents, the r’s were not significantly different from those obtained with more 
able students. The use of modified, “placement” instructions did not greatly 
affect the mean scores or the magnitude of the correlations. 


In recent years, several new scales have been 
developed from Strong Vocational Interest 
Blank (SVIB) items to aid both in predicting 
college achievement and in understanding the 
motivational and temperamental factors as- 
sociated with academic success (Campbell & 
Johansson, 1966; Martin, 1964; Rust & Ryan, 
1954). Each of these scales has survived 
cross-validation study in at least one setting. 

Rust and Ryan (1954) developed separate 
scales to predict overachievement, normal 
achievement, and underachievement at Yale 
University. These scales significantly differ- 
entiated between various groups of over-, 
normal, and underachievers at both Yale Uni- 
versity and, more recently, Harvard University 
(McArthur, 1965). 

Martin (1964) constructed a series of aca- 
demic interest scales (both long and short 
forms) from SVIB items for males and fe- 
males enrolled in liberal arts and males en- 
rolled in engineering and mines at the Uni- 
versity of Pittsburgh. The scales significantly 
contributed to a multiple correlation based in 
part upon Scholastic Aptitude Test (SAT) 
scores and high school rank in predicting first 
year grades for succeeding classes at the 
University of Pittsburgh. 

1The research reported herein was performed 
pursuant to a contract with the Office of Education 
of the United States Department of Health, Educa- 
tion and Welfare while the author was on the staff 
at the University of Massachusetts. 

2Requests for reprints should be sent to the 


author, Counseling Center, University of Wisconsin, 
415 Gilman Street W., Madison, Wisconsin 53706. 


Finally, Campbell and Johansson (1966) 
developed the Academic Achievement (AACH) 
scale to differentiate between high and low 
achievers in the College of Liberal Arts at the 
University of Minnesota. Although the AACH 
scale correlated significantly with first year 
grade point average in the cross-validation 
study at Minnesota (r = .36), it did not sig- 
nificantly contribute to a multiple correlation 
coefficient consisting of a scholastic aptitude 
test and high school rank in predicting grades. 
The scale provides some insights into the per- 
sonal, motivational characteristics associated 
with high and low grades. This new nonoc- 
cupational interest scale has been added to 
the profile of the 1966 revision of the SVIB 
(Strong & Campbell, 1966). 

Will these various interest scales be effec- 
tive in predicting achievement in a new 
academic setting? The present study was pri- 
marily addressed to this question. Four spe- 
cific questions concerning the practical ap- 
plication of the academic interest scales were 
asked: 

1. Are the SVIB scales more effective than 
a single-item, self-rating scale? Holland and 
Lutz (1968), in particular, have argued that 
simple, direct questions might produce re- 
sults as effective as, or possibly more effective 
than, long lists of inventory items. 

2. Are the SVIB academic interest scales 
more highly correlated with college grades for 
“marginal” students than for superior or aver- 
age students? Clark (Clark, 1961; Clark & 
Campbell, 1965) has presented data suggest- 
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ing that when learning ability was “just 
adequate,” the correlation between interests 
and achievement was more pronounced. 

3. Is the degree of relationship affected if 
the students take the inventory with instruc- 
tions that the results may be used for place- 
ment purposes? While it has been long known 
that the SVIB profile may be rather easily 
faked, the effect of such distortion is not well 
established. Some studies (e.g., Ruch & Ruch, 
1967) suggest that “real life” incentives to 
fake may actually improve the predictive 
validities of inventories. If the S$ knows for 
what purposes the tests will be used, he will 
be better able to indicate the specific role 
which he is willing to play in that particular 
situation (Hathaway, 1960). 

4. Do the SVIB scales significantly con- 
tribute to a multiple correlation coefficient 
based in part upon SAT scores and high 
school rank in predicting college achievement? 
To be most helpful in both counseling and 
selection, the interest scales should aid in 
accounting for that part of the variance in 
college grades not already accounted for by 
readily available intellective predictors. 


METHOD 


Subjects 


The final sample consisted of 290 freshman males 
enrolled in the College of Arts and Sciences (A&S) 
and 98 freshman males enrolled in the School of 
Business Administration (SBA) at the University of 
Massachusetts who participated in the summer 
orientation program in 1967. Eleven of the A&S 
students and two of the SBA students originally 
tested were not included in the final sample because 
of failure to enroll in college, failure to complete the 
first semester, or lack of SAT scores. 


Measuring Instruments 


Each student was asked to complete the SVIB and 
a single item, self-evaluation rating scale as part of 
the regular precollege testing program administered 
during freshman summer orientation sessions. The 
self-evaluation rating scale consisted of a single, 
horizontal line with 100 percentage points drawn on 
it. The student was instructed to mark as closely as 
possible the percentage point which best represented 
the percent of other freshman male students whom 
he felt that he would surpass in terms of his first 
semester grade point average. 

The SVIB was scored to yield the following seven 
academic interest measures: (1-4) Overachievers, 
Normalachievers, Underachievers, Overachievers mi- 
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nus Underachievers (O minus U) (Rust & Ryan, 
1954); (5-6) Academic Interest Scale (AIS): Liberal 
Arts Males (LAM), 1959 version, Long and Short 
Forms (Martin, 1964), and (7) Academic Achieve- 
ment Scale (Campbell & Johansson, 1966). 

The 1966 revision of the SVIB (Strong & Camp- 
bell, 1966) was used. As 109 of the 400 items on the 
SVIB were dropped, both the Rust and Ryan and 
the Martin scales, which were based on the old 
form of the SVIB, have fewer items on the new 
form. The effect of this reduction in the number of 
items on the intercorrelation of the old form with 
the new form and upon test-retest reliability was 
determined by means of a sample of 101 young 
adults who took the old form of the SVIB (which 
includes all the items scored on the new form) twice 
over a 30-day interval.’ 

The instructions for both the SVIB and the self- 
rating scale were modified for one-fourth of the 
total sample. The modified instructions informed 
the students that the results might be used in placing 
them in advanced courses. The specific instructions 
are given below. The routine SVIB instructions read 
as follows: “Among other things, research has shown 
that this test is helpful in making vocational and 
educational plans. The test enables the student to 
compare his interests with those of people employed 
in various occupations. High scores indicate occupa- 
tional similarity; low scores indicate dissimilarity. 
The test results serve as an index of the type of work 
which you will find interesting. The results will be 
used in discussing occupational and educational plans 
with you.” 

The modified SVIB instructions read as follows: 
“Among other things, research has shown that this 
test is a fairly good index of academic motivation. 
Students who receive high ‘academic motivation’ 
scores generally do well in their college courses. 
Students who obtain low ‘academic motivation’ 
scores often experience difficulty in their courses. The 
test results may serve as a measure of your motiva- 
tion or desire to do well in your course work. As 
such, the results may be used to guide your place- 
ment in some of our challenging courses.” 

Similarly, the instructions for the self-rating scale 
were varied. The routine instructions began as fol- 
lows: “Your estimate of your first semester academic 
performance will be helpful to your counselor in 
discussing your program of courses with you.” 

The modified instructions began as follows: “Your 
estimate of your first semester performance will be 
used as an index of your desire to do well in your 
course work. As such, it may be used as a guide in 
placing you in some of our more challenging courses.” 

The tests were administered to the students in 
groups of 30 to 60 by University personnel. Every 
fourth test folder contained the modified instruc- 
tions. The sections with the modified instructions 
were not read aloud. Of the 290 A&S students in- 


8 These data were generously supplied by David P. 
Campbell, Director, Center for Interest Measurement 
Research. The composition of the sample is described 
elsewhere (Strong & Campbell, 1966, p. 27). 
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TABLE 1 
Terst—RETEST RELIABILITIES OF REVISED AND OLp Forms or SVIB 
ACADEMIC INTEREST MEASURES 
Scale No. of re a Test-retest ee ee. 
ae items reliability of revised an 
M SD M SD old forms 
Rust and Ryan Scales* 
Underachievers 
Revised 37 —2.6 4.2 —2.9 4.2 67 98 
Old 43 —2.9 4.3 — 3:3 4.6 68 
Normalachievers 
Revised 24 af 3.6 dd 3.8 WP .96 
Old 29 BS 4.2 fe 4.3 nie 
Overachievers 
Revised 23 2.1 3.3 1S 3.2 14 93 
Old 34 3.1 3.9 2.2 4.0 fll 
O minus U 
Revised 4.8 5.8 4.4 Ser Bee 97 
Old 6.1 6.3 535 6.3 1? 
Martin Scales (LAM)> 
ATS-Short 
Revised 24 1SoM 2.9 13.5 2.9 BL .96 
Old 31 17.9 3.4 17.6 od 14 
[‘AIS-Long 
Revised 83 46.9 6.0 45.9 6.5 76 97 
Old 107 60.1 Jal 58.2 7.6 76 
AACH Scale 55 48 12 47 12 88 


Note.—n = 101. 

® Responses weighted —1, 0, or +1. 

a Responses weighted 0 or 1. 

¢ Data from SVIB Manual (Strong & Campbell, 1966). 


cluded in the final sample, 68 received the modified 
instructions. Of the 98 SBA students in the final 
sample, 26 received the modified instructions. 

Predicted grade point averages (PGPA), obtained 
by means of multiple regression equation based upon 
Converted Class Rank (secondary school rank) and 
SAT Verbal and Mathematics scores were obtained 
for all Ss from the Dean of Admissions Office 
(Glover, 1963). The current version of this formula 
for freshman males enrolled in either Arts and 
Sciences or Business Administration is: PGPA = .01 
SAT-Verbal + .038 Converted Class Rank —.545. 
Both SAT-Verbal and Converted Class Rank are 
expressed as T scores with an M of 50 and an SD of 
10. 


Data Analysis 


The relationship between the academic interest 
scales and first semester grade point average (GPA) 
was determined for the following groups of Ss: 
A&S students (routine instructions), SBA students 
(routine instructions), A&S and SBA students (rou- 
tine instructions), A&S and SBA students (modified 
instructions), and high-, middle-, and low-predicted 
GPA groups. 

The A&S and SBA students were combined in some 
instances to increase the size of ”. This combination 
appears to be justified in that the two groups of 


students were enrolled in essentially the same pro- 
gram of courses for the first semester. 

The A&S students were divided as equally as pos- 
sible into three levels of PGPA. High-predicted 
GPA, “superior” student (PGPA=2.3 or higher; 
n= 81), middle-predicted GPA “average” student 
(PGPA=2.1 or 2.2; m=85), and low-predicted 
GPA, “marginal” student (PGPA=2.0 or lower; 
n=56) groups were formed. 

Both zero order and multiple correlation coef- 
ficients were computed. The significance of the in- 
crease in multiple R due to the inclusion of addi- 
tional variables was tested by means of the analysis 
of variance procedure described in McNemar (1962, 
p. 284). 


RESULTS 


The use of the revised version of the SVIB 
did not appreciably affect the relative scores 
of Ss (see Table 1). The intercorrelation of 
the old and the revised forms was in no case 
less than .93. The test-retest reliabilities were 
virtually the same for both the old (long) and 
the revised (short) forms of the scales. 

The means and standard deviations of 
variables for all groups of Ss are shown in 
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TABLE 2 
MEANS AND STANDARD DEVIATIONS OF VARIABLES FOR ALL GROUPS 
Groups 
School of 
Arts and | Business A&S 

Variable Sciences | Adminis- Bate Place- High Middle 
tration SBA ment PGPA PGPA 
nm = 222 ae = 294 n = 94 n = 81 n = 85 
Self-rating scale M 68.1 63.9 67.1 68.3 72.2 66.2 
SD Dee 12.2 192.8} 11.9 11.4 11.9 

Rust and Ryan Scales 
Underachievers M es) —.2 —.2 —.4 —.6 —.8 
SD 4.8 4.2 4.7 4.4 4.9 5.0 
Normalachievers M —1.6 “3 —1.2 —1.2 —1.4 —1.7 
SD Sef 3.4 Sai oH 3.9 3.8 
Overachievers M 3 ‘ 3 By 1.0 re 
SD 2.9 2.8 2.8 3.4 Qik 3.0 
O minus U M a5) aD 4 5 15) 9 
SD 6.2 5.0 5.9 5.8 5.9 6.5 

Martin Scales (LAM) 
ATS-Short M 11.8 14.7 1225 12.7 1:22 11.4 
SD 3.0 6.9 4.5 4.1 25 3.3 
ATS-Long M 43.9 41.4 43.3 43.1 45.4 42.8 
SD 6.0 6.3 6.2 6.2 5.8 5.4 
AACH Scale M 46.1 34.9 43.3 44.1 48.7 44.0 
SD 11.4 11.6 A283 12.6 10.4 1252. 
Grade Point Average | M Dal 1.8 2.0 2.0 ee 2.0 
SD Ba 6 4 6 ai 6 
Predicted GPA M 2.2 Zt 2.2 De, 2.4 Dee, 
SD vi D) EZ, ae, el al 























Low 
PGPA 


Li=150 


64.9 
12.4 


4.3 
—1°8 
S53 
—.4 
2.8 
—1.5 
Jo) 


11.8 
3.0 
43.3 
6.9 
45.4 
11.0 
1.9 
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Table 2. It may be noted that most of the 
students rated themselves well above average 
in predicting their first semester class stand- 
ing. The mean scores on the various academic 
interest scales are roughly comparable to the 


mean scores of relevant groups of college 
students reported in the literature (Campbell 
& Johansson, 1966; McArthur, 1965). 

The intercorrelations of the predictor vari- 
ables for the A&S students are reported in 











TABLE 3 
INTERCORRELATIONS OF ACADEMIC INTEREST SCALES FOR ARTS AND SCIENCES STUDENTS 
AIS- AIS- 

Scale U N O O-U Short Long AACH | PGPA 
Self-rating scale — .06 — .03 11 .10 05 alos one Wait 
Rust and Ryan scales 

Underachievers (U) —.42** | —.21** | —.88** | —.20** | —.14* —.09 —.11 

Normalachievers .03 poaee Be 10 —.09 .04 

Overachievers (O) .64** .42** .40** 200m 105 

O minus U y cone Lone .20** 18** 

Martin Scales (LAM) 

AIS-Short 00%", 40** 06 

ATS-Long 34** 11 
AACH scale ws 








* 


Note.—n = 222, 
<r.00; 
** > <.01. 
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TABLE 4 
CORRELATION BETWEEN ACADEMIC INTEREST SCALES AND FIRST SEMESTER 
GRADE Point AVERAGE 
aeaend eee  BaS Sea let pinces High | Middle Low 
: Business and 
ed ctor sciences Waren SBA ment PGPA PGPA PGPA 
(n = 222) Gi—=a72) 0) (= 294) Gv = 94) | Gz = 81) | Ge = 85).| (=-56) 
Self-rating scale .00 a2 .05 19 —.11 .00 = 07 
Rust and Ryan Scales 
Underachievers (U) — .18** —.01 — .15** 02 —.17 —.13 —.26 
Normalachievers .08 07 104 — .02 08 .03 .16 
Overachievers (O) 31** 07 20%" 16 19 .31** By 
O minus U .30** 25n BDO er .06 23" .26* AND a 
Martin Scales (LAM) 
ATIS-Short WP? 14 .06 ailS 07 .16 05 
AIS-Long .09 —.08 .08 .10 .06 01 .10 
AACH Scale alghe 02 PSR 29** Bits 14 15 
Predicted GPA 19% nook 24** .20 eld 18 —.16 





* > <.05. 
p> < 0.1. 


Table 3.4 The various measures of academic 
interest were lowly intercorrelated. None of 
the 7’s for the separate scales (excluding 
scales which are based in part upon one of the 
other scales, e.g., the O minus U score or the 
Martin scales) exceeded .42. Inspection of 
the item content for the various scales in- 
dicated relatively little overlapping (20- 
40%) in the use of specific items. Surpris- 
ingly, 15-25% of the items which did over- 
lap were scored in the opposite direction. 

The main findings in the study are reported 
in Tables 4 and 5. The academic interest 


4The intercorrelations of the predictor variables 
for the other groups of Ss, which are very similar to 
the intercorrelations for the A & S students, are 
tabulated in the final research report (Johnson, 1968). 


scales predicted first semester performance as 
effectively as the PGPA regression formula. 
All of the r’s tended to run fairly low, no r 
exceeding .40. 

The most successful academic interest scales 
for these students were the Rust and Ryan 
scales, particularly the Overachiever scale and 
the O minus U score, and, secondly, the 
Campbell and Johansson AACH scale. The 
Martin scales and the self-evaluation scale did 
not significantly correlate with GPA for any 
of the groups. 

While the direction of the relationship be- 
tween the SVIB scales and GPA for the three 
ability groups supported Clark’s (1961) find- 
ing that the relationship was greater for the 
lower, or “marginal,” students, the 7’s were’ 


TABLE 5 


MULTIPLE CORRELATIONS BASED ON PREDICTOR VARIABLES WHICH SIGNIFICANTLY (p < .05) 
INCREASED THE DEGREE OF RELATIONSHIP WITH GRADE PoINT AVERAGE 

















Group n 

Arts and Sciences 222 
Arts and Sciences 222 
Arts and Sciences plus School 

of Business Administration 294 
Arts and Sciences plus School 

of Business Administration 294. 
Middle PGPA (Arts and Sciences) 85 


Predictors R 
Overachievers Scale, PGPA 34 
AACH Scale, PGPA .24 
O minus U Score, PGPA, 

Underachievers Scale 38 
AACH Scale, PGPA .28 
Overachievers Scale, PGPA 39 
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not significantly different from each other. A 
greater number of significant 7’s were found 
with the low and middle groups, however, 
than for the high group. 

The correlations were not any higher for 
the motivated “placement” group than for 
the “discussion” groups. Only Campbell and 
Johansson’s scale predicted GPA significantly 
for this group. 

Finally, as shown in Table 5, the magnitude 
of the relationship between the predicted 
GPA and GPA was significantly increased by 
the addition of one or two of the academic 
interest scales in at least several instances. 
The total amount of variance accounted for 
(15 or 16% at most) is still relatively small, 
but, nonetheless, some of the error in predic- 
tion has been reduced. 


DISCUSSION 


With the exception of Campbell and Johans- 
son’s scale, the test-retest reliabilities of the 
remaining academic interest scales are not 
sufficiently high for routine individual in- 
terpretation. Although the test-retest relia- 
bilities are higher than the reported split-half 
reliability coefficients (Martin, 1964; Rust & 
Ryan, 1954), presumably due to the heteroge- 
neous nature of the item content, the relia- 
bilities still average only in the .70 to .75 
range. If the scales could be increased in 
length by using items of comparable validity, 
the test-retest reliabilities could be substan- 
tially improved (Abrahams, 1967). Until such 
an event, the scales may be most safely used 
for group interpretations or for forming (not 
testing) hypotheses regarding individuals. 

The modest reliabilities attenuate the maxi- 
mum validities possible for the scales. Despite 
this limitation, the Rust and Ryan scales, 
together with the Campbell and Johansson 
scale, possessed promising validity for use 
with the students in this study. The Over- 
achievers scale, O minus U score, and AACH 
scale each correlated as highly as predicted 
GPA with first semester grades. 

The ineffectiveness of the Martin scales in 
predicting GPA needs some explanation. The 
composition of the student body and/or the 
courses comprising the first-year schedule ap- 
parently, varied sufficiently from that of the 
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University of Pittsburgh to prevent successful 
cross-validation of the scales. The scales them- 
selves, although constructed in a manner 
somewhat different from either the Rust and 
Ryan or Campbell and Johansson scales, do 
not appear to be at fault in that they did 
effectively predict academic performance for 
successive samples at Pittsburgh. 

As a one-item measure, the self-evaluation 
scale may have lacked adequate reliability to 
predict grade performance. The fact that 
nearly all the students rated themselves above 
average suggests that the students’ self-per- 
ceptions were not very accurate at best. Both 
Torrance (1954) and Stone (1962) report a 
similar tendency on the part of students to 
overestimate their academic potential. Tor- 
rance also found very little relationship 
between self-predicted grades and achieved 
grades. Stone did not report the predictive 
validities of the students’ self-ratings. 

The lack of a significant relationship be- 
tween self-predicted and obtained grades con- 
trasts sharply with Young’s (1954) and 
O’Hara’s (1966) findings that self-ratings 
added significantly to multiple Rs consisting 
of various aptitude measures in predicting 
academic success. In both the Young and the 
O’Hara studies, however, the students made 
their self-estimates sometime after school had 
started; in fact, their grades had already been 
in part determined by examinations which 
they had taken. It is worth noting that stu- 
dents do respond to feedback they receive 
within their environment; however, the tech- 
nique loses any meaning as a preenrollment 
index of potential academic achievement. The 
main value of preenrollment ratings, as ob- 
served by Torrance (1954), may be to in- 
volve the students more deeply in test in- 
terpretation and to assist the counselor in 
determining how resistant the student will be 
in accepting test results. The scale may reveal 
the student’s wish to succeed but not neces- 
sarily reflect any added effort on his part to 
insure success. 

Although clear-cut statistically significant 
differences among the three predicted GPA 
groups failed to emerge, the findings were in 
the expected direction. The difficulty in cross- 
validating results found with such subdivi- 
sions of the total sample has been clearly il- 
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lustrated by Hakel (1966). Perhaps more 
carefully refined PGPA groups, as well as a 
larger n, would have produced more defini- 
tive results in the present study. The rela- 
tionship between ability and achievement at 
different motivation levels also needs further 
exploration (French, 1958). 

The modified instructions indicating that 
the test results might be used in advanced 
placement apparently did not greatly influence 
the scores for the students. The means and 
standard deviations for both the “placement” 
and the “discussion” groups were approxi- 
mately the same. The predictive validity coef- 
ficients were of approximately the same mag- 
nitude for each group. If anything, the 7’s 
appeared to run slightly lower for the “place- 
ment” group than for the “discussion” group. 
This result fails to support the hypothesis 
that “sensible distortion,” which may occur 
in real-life situations (Gellerman, 1963), may 
actually increase the validity of the test 
scores. The findings are in accord with Walsh’s 
(1967, 1968) observation that validity of 
self-report is not greatly affected by incen- 
tives to distort. Perhaps more critical instruc- 
tions (e.g., results would be used in selection 
instead of advanced placement) or a different 
test-taking atmosphere (SVIB administered 
at the same time as entrance examinations) 
would have had a greater impact on the 
results. 

The success of the Rust and Ryan scales 
and the AACH scale in contributing sig- 
nificantly to a multiple R based in part upon 
aptitude and achievement variables in predict- 
ing college achievement at the University of 
Massachusetts is encouraging. The failure of 
nonintellective measures to cross-validate in 
predicting academic success in new settings is 
well recorded (Super & Crites, 1962). Al- 
though the size of the multiple R is only 
moderate, if extreme scores (e.g., plus and 
minus one standard deviation) are used as 
cut-offs, relatively accurate classification of 
successful or unsuccessful students would be 
possible (Taylor & Russell, 1939). With the 
collection of local cross-validation data, ex- 
pectancy tables for converting the scores of 
very low and very high scoring students into 
GPA probabilities may be profitably con- 
structed. 
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Study of the content of the scales, although 
possibly misleading if not supported by the- 
oretical assumptions, may serve as a source of 
hypotheses for additional research. The con- 
tent of the Overachievers scale, perhaps the 
single most efficient predictor, indicates that 
achievement beyond one’s predicted level is 
associated with items suggesting conservatism 
(playing safe, not loaning money), conven- 
tionality (lack novel ideas, work where can 
stay in one place), conscientiousness (plan 
work in detail), passive feminine interests 
(birdwatching, music teaching), and lack of 
mechanical interests (auto mechanic, adjust- 
ing a carburetor). According to McArthur 
(1965), items on the Rust and Ryan scales 
reflect “conscientious perseverance.” 

The above description agrees rather well 
with Nichols’ (1966) observation that stu- 
dents who get good grades are likely to be 
“compulsive and conforming.” The achieve- 
ment level among students may possibly be 
raised by selecting or training students on 
such characteristics. Other temperamental or 
motivational characteristics may be rewarded 
by modifying the criteria used for achievement 
within courses or by broadening the definition 
of achievement to include extracurricular ac- 
complishments. The personal qualities de- 
sired on the part of the students will depend 
upon the criteria of achievement established 
by the educational institution. Techniques for 
identifying and reinforcing both the persever- 
ing student and the creative student need to 
be developed. 
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RATED ACCEPTABILITY OF MINERAL TASTE IN WATER: II. 


COMBINATORIAL EFFECTS OF IONS ON QUALITY AND 
ACTION TENDENCY RATINGS? 


WILLIAM H. BRUVOLD 2 
University of California, Berkeley 
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WILLIAM R. GAFFEY 
California State Department of Public Health 


Previous research has shown that equal concentration simple solutions of 
minerals in water receive significantly different taste quality ratings. The 
differences in ratings appear to be accounted for by solution anions. The present 
research was designed to investigate the combinatorial effect of mineral anions 
on taste quality ratings given specially prepared water samples. Thirteen water 
samples were rated on two scales by 25 Ss. Results indicated that anions 
independently influence the general taste quality of mineralized water. The 
implications of these results for establishing limiting standards for minerals in 


domestic water were discussed, 


Research is in progress which aims to pro- 
vide the information needed to establish limit- 
ing standards for mineral content in domestic 
water in order to ensure potability for daily 
consumption (Bruvold, Ongerth, & Dillehay, 
1967). The current recommendation limiting 
mineral content made by the United States 
Public Health Service (1962) is in terms of 
total dissolved solids (TDS) as milligrams 
per liter (mg/l) as are the recommendations of 
the state of California. Standards for mineral 
content in domestic water expressed only as 
TDS may be reasonable at present; however, 
they may not remain so as evidence regard- 
ing the relationship between dissolved minerals 
and taste quality accumulates. 

Mineral content in water is composed pri- 
marily of calcium, magnesium, potassium, and 
sodium cations in combination with bicar- 
bonate, carbonate, chloride, nitrate, and sul- 
fate anions. Previous research (Bruvold, 1968; 
Bruvold & Pangborn, 1966) has amply dem- 
onstrated that mean taste quality ratings vary 


1 This research was supported in part by funds 
provided by the United States Department of In- 
terior, Office of Water Resources Research, as 
authorized under the Water Resources Act of 1964, 
by the University of California Water Resources 
Center, and in part by United States Public Health 
Service, National Institutes of Health General Re- 
search Support Award 5-S01-FR-5441 to the School 
of Public Health, University of California, Berkeley. 

2 Requests for reprints should be sent to William 
H. Bruvold, University of California, School of Pub- 
lic Health, Earl Warren Hall, Berkeley, California 
94720. 


significantly for the equal concentration min- 
eral solutions containing one cation and one 
anion. The differences between these ratings 
were attributed to anion effects since all bi- 
carbonate and sulfate solutions received mildly 
unfavorable ratings, all chloride solutions 
moderately unfavorable ratings, and all car- 
bonate solutions strongly unfavorable ratings. 
The magnitude of differences between ratings 
given these equal concentration solutions 
suggests that TDS standards may be inap- 
propriate. 

The present research was designed to in- 
vestigate how taste quality ratings are in- 
fluenced by various combinations of mineral 
ions since natural waters used for domestic 
consumption contain certain amounts of most, 
if not all, of the ions listed in solution. Such 
data can answer basic questions regarding the 
combinatorial effects of ions on taste quality 
ratings, and they can also be employed in a 
consideration of various methods for estab- 
lishing standards which limit mineral content 
in domestic water. 


MeEtTHOD 


Twenty-five employees of the California State De- 
partment of Public Health served voluntarily as 
raters in this research. Sixteen of the raters were 
males, 9 were females, and all had served in 
previous water taste research. All individuals were 
selected because of their availability and willingness 
to participate; none was asked to participate on the 
basis of prior performance in rating the taste of 
water samples. 
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TABLE 1 
CyemicaL ANALYsIS DATA ror 13 WATER SAMPLES 

Sample number Na Cl COs HCO, SO, ‘TDS by evaporation 

1 0 0 0 0 0 () 

Z 176 125 31 195 0 432 

3 157 0 38 195 124. 424 

4 161 0 42 348 0) 374 

5 173 125 0 100 124 474 

6 428 375 132 305 0 1119 

7 400 0 138 348 377 1114 

8 422 0 312 §25 (0) 1005 

9 400 380 0 0 376 1188 

10 888 700 300 572 0 2212 

11 786 0 264. 680 751 2180 

12 742 0 288 1464 0) 1732 

13 841 700 0 0 750 2380 

Note.—Concentrations in milligrams per liter (mg/1). 

Twelve water samples were specially prepared A quality and an action tendency rating scale were 


using reagent grade NaCl, NaHCOs, and NasSOx. with 
double distilled water. Sample concentrations were 
designed to cover the meaningful TDS range found 
in natural waters used for domestic supply, and to 
vary widely in type and amount of anions present. 
Double distilled water was also employed as one 
sample in the study. Chemical analysis data for all 
water samples obtained 2 wk. after preparation are 
shown in Table 1. Sodium was the only cation used 
since two earlier attempts to prepare samples con- 
taining calcium and magnesium cations resulted in 
uncontrolled carbonate precipitation. None of the 12 
samples showed any sign of precipitation over the 
entire duration of the study. 


TABLE 2 


MEANS AND STANDARD DEVIATIONS 
FOR 13 WATER SAMPLES 











‘ " Action tendency 
uality ratings : 
Sample 2 _ q ratings 
number = |——H—_|—_—. 
M SD M SD 
1 (een OT 7.41 Lisi, 
2 6.32 1.32 6.80 1.48 
3 6.40 eo 6.90 151 
4 5.95 1.45 6.50 127 
5 6.44 1.34 6.81 1.40 
6 3.95 LOZ 4.50 1,59 
7 4.58 1.13 5.24 1,28 
8 3.76 OL 4.32 1.50 
9 4.78 ez oh) 1.88 
10 2.08 0.94. Die LoD 
11 3.45 1.32 4.00 Toe 
12 3.78 1.17 4.42 157 
13 BOO) 1.03 3.76 1.60 


employed in this study. Construction of these scales 
has been fully reported in an earlier paper (Bruvold, 
1968). Both scales contained nine rating statements 
and each statement referred directly to the taste of 
water, Scale distance between individual items was 
approximately equal for both sets of statements and 
scale values were highest for statements indicating 
the most favorable reaction to the water’s taste. 

Each rater took part in three rating sessions 
separated by a period of 1 wk, All 13 samples 
were rated during each session, Order of sample 
presentation was independently randomized before 
each session, and a different sample code was em- 
ployed for each of the 3 wk, of the study, Rating 
was performed alone in a small air-conditioned room 
whose temperature was maintained at 72° 2° F, 
Samples were served at room temperature in 100 ml, 
beakers filled to the 75 ml, level. Each rater tasted 
each sample three times, marked the appropriate 
rating category for the quality and the action tend- 
ency scale, rinsed thoroughly with Berkeley tap 
water (85 mg/l TDS), and rested for 30 sec, as 
gauged by a laboratory timer, This procedure was 
repeated until all samples were rated. Numbers were 
not used in conjunction with the rating scales; raters 
simply marked the appropriate category for each 
sample code and rating statement shown on the ap- 
propriate data sheet, Quality statements were ordered 
with the most favorable item at the top of the data 
sheet, while action tendency items were listed in 
reverse order placing the most unfavorable state- 
ment at the top of the data sheet, 


RESULTS 


The 3 ratings given each water sample 
by a rater on each scale were averaged yield- 
ing 13 mean ratings for each scale and rater, 
Equal appearing interval scale values for rat- 
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TABLE 3 
Penquaney DisthinutTions fon MULTIPLE REGRESSION COLFFICIENTS AND TERMS 
WHOM TH INDIVIDUAL REGRESSION KOUATIONS 
Irequency 7 I"requeney 
Multiple 2 Regression equation a ee C 
constants 

() ALT QO AT 

0,95-0,99 5 6 9,50-10,99 | 0) 

0,90-0,94 1 11 8,00— 9,49 4 6 

0,85=0,89 5 6 6,50— 7,99 9 13 

0),80-0,84 4 | 5.00— 6,49 10 4 

0,75-0,79 0) | 3,50- 4,99 | 2 

: f Na H¢ ic dy Cé )y Cl $0, 

Regression equation Pu en one 
terms 

() AT () AT ) AT Q) AT Q AT 
060 O89 4 2 0) Q () 0) 0) 0 0 0 
040 059 4 J 0 Q Q 0) Q) () 0 0 
000 029 ld 13 6 4 4 6 4 6 4 6 
= 030=— 001 4 5 19 16 12 15 17 16 19 18 
= 0600-031 () 2 () | 9 4 4 3 Z 1 





ing statements were used as individual nu- 
merical rating scores. Means and standard 
deviations for ratings given the 13 solutions 
are shown in Table 2, Hach entry in Table 2 
is based upon 25 mean rating scores, 

Stepwise multiple regression analyses 
(Walker & Lev, 1953) were performed on the 
mean ratings for each individual rater and 
scale, The results of these 50 analyses, analy- 
ses which employed the scale rating as the 
dependent variable and the ionic concentration 


values shown in ‘Table 1 as independent vari- 
ables, are summarized in Table 3. The same 
regression analysis was performed upon. all 
325 mean ratings for each scale. The sum- 
mary regression equation for the quality 
scale was Q'=6,81 4+ 0.020 (Na) — 0.008 
(HCO;) — 0,022 (COs;) — 0.016 (Cl) — 
0,011 (SO4), the multiple R associated with 
this equation was 0.719, and eta was 0.738. 
Analogous values for the action tendency 
scale were AT’ = 7,22 + 0,011 (Na) — 0,005 


TABLE 4 


ANALYei8 OF VARIANCE FoR RATING SCALE ResuLts BY TDS Catrecortes 


Quality ratings 





Source df 
A MS 

Between people 24 311,08 
Within people 300 489,20 
Waters 12 655,33 54,61 
Linear trend 1 566.61 566,61 
Quadratic trend | 43,79 43,79 
Cubile trend | 0,55 0,55 
Quartic trend 1 2,79 2,79 
Residual 288 233,87 0.51 

Total 324 1200,24 


~~ — 


< 05, 


* 
oo} < 001, 


Action tendency ratings 





fr SS MS ff 
454,89 
851,60 
67,42** 594,61 49,55 55,67°" 
699, 52** 512,17 Dey 575,47* 
54,00** 20,48 20,48 23,01** 
0,68 0,41 0,81 0,91 
3.44 4,59 4,59 5,16* 
256,99 0,89 


1310.49 
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TDS as MILLIGRAMS PER LITER 


Fic. 1. Lines of best fit for Q and AT scale ratings. 


(HCO3) 0.016 (CO3) — 0.010" (Cl) = 0007 
(SO), R = 0.659, and eta = 0.675. 

Regression analyses of the relationship be- 
tween TDS values and mean scale ratings were 
also performed (Winer, 1962). Results of 
these analyses are shown in Table 4 and in 
Figure 1. Quadratic regression equations for 
the TDS values were Q’ = 7.50 — 0.0036 
(TDS) + 0.0000008 (TDS)? and AT’ = 7.71 
— 0.0030 (TDS) + 0.0000005 (TDS)?. 


DiIscUSSION 


The multiple regression equations reported 
above involve only one cation, sodium, since 
earlier attempts to prepare solutions contain- 
ing additional common mineral cations re- 
sulted, as noted, in uncontrolled carbonate 
precipitation. The relatively large regression 
coefficient for sodium was likely due to the 
fact that there was a substantial direct rela- 
tionship between amount of sodium and TDS 
in the solutions here employed. The regres- 
sion coefficients for anions match earlier re- 
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sults for simple solutions (Bruvold, 1968; 
Bruvold & Pangborn, 1966) since the nega- 
tive coefficients are smallest for bicarbonate 
and sulfate, intermediate for chloride, and 
largest for carbonate. 

Multiple regression analyses demonstrated 
that combinatorial ionic effects on quality and 
action tendency ratings were adequately de- 
scribed by a model involving only a first de- 
gree function containing no interaction terms. 
The multiple correlation coefficients for the 
two scales were close enough, in light of the 
degrees of freedom involved, to the corre- 
sponding etas to show that adding higher de- 
gree or interaction terms to the five-variable 
function could account for only a very small 
additional portion of the explainable variance. 
These data indicate that, for the raters, ions, 
and concentrations here employed, there were 
no important synergistic or masking effects 
between ions as they influenced taste ratings. 
Rather, each ion appeared to make a straight- 
forward contribution to the ratings obtained 
according to its concentration in solution with- 
out affecting or being affected by other ionic 
constituents. The same interpretation holds 
for individual results since multiple correla- 
tion coefficients were generally high as shown 
in Table 3. Further, most individual regres- 
sion equations were very similar to the sum- 
mary regression equations reported. 

A further opportunity to test the validity 
of independent ionic influence on taste quality 
ratings arises in connection with simple min- 
eral solutions rated a year before the present 
study was undertaken (Bruvold, 1968). If the 
conclusion of independence is valid, it should 
be possible to reproduce.the earlier ratings 
using only the multiple regression equations 
here derived. Following this reasoning, pre- 


TABLE 5 


PREDICTED AND OBSERVED MEAN RATINGS FOR SIMPLE SOLUTIONS 





Mean Q rating 


Mean AT rating 








Mineral Concentration 
Predicted Observed Predicted Observed 
NaHCO; 1,000 mg/1 6.48 55 6.60 5.79 
Na»SO, 1,000 mg/1 §.85 5.79 6.05 6.88 
NaCl 1,000 mg/1 5.10 4.49 5.56 5.43 
Na2CO3 1,000 mg/I 3.04 3.11 2.93 3.58 
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dicted mean Q and AT ratings were obtained 
for each of four simple mineral solutions by 
introducing appropriate values for the anion 
and cation in question, setting remaining 
ionic values equal to zero, and solving the re- 
gression equation in a straightforward fashion. 
Predicted mean ratings obtained in this man- 
ner are shown in Table 5 together with the 
mean ratings actually obtained from the 
earlier study. It may be seen that the agree- 
ment between predicted and observed mean 
ratings was good considering variations in 
procedures and raters (11 raters were com- 
mon to both studies). Such agreement would 
not have been obtained if ions combine in a 
complex manner to determine taste ratings. 
Results of the regression analyses involving 
TDS, rather than individual ions, showed that 
there was a significant linear trend in the data 
which accounted for the bulk of the explain- 
able variance. Thus, multiple regression was 
not markedly superior to simple regression in 
terms of total variance accounted for. How- 
ever, inspection of Figure 1 shows that cer- 
tain means deviated considerably from the 
line of best fit. This result indicates that the 
multiple regression approach is preferable to 
the simpler TDS approach for establishing 
limiting standards since it would not mis- 
predict rating scores as grossly as the simpler 
approach for unusual water samples contain- 
ing very high concentrations of some ions 
and very low concentrations of others. Such 
mispredictions could cause great difficulty in 
attempts to establish and administer limiting 
standards for mineral content in domestic 
water. TDS standards could result in ap- 
proval of unpotable waters for daily con- 
sumption while requiring demineralization for 
waters actually suitable for daily drinking. 
The statistically significant, but not highly 
important, quadratic trends found in the TDS 
regression analyses were probably due to the 
fact that mean carbonate concentrations, con- 
centrations impossible to control precisely 


321 


during solution preparation, were relatively 
higher for the 1,100 mg/l samples than they 
were for the remaining TDS levels. Had the 
relative carbonate concentrations been lower 
for the 1,100 mg/l samples, a negative quad- 
ratic term might have been obtained. Thus, 
the multiple regression approach also appears 
preferable to second or third degree TDS 
functions for establishing limiting standards. 
The present results suggest that the nature 
of these higher order terms was related to the 
relative pattern of ionic concentrations in the 
samples studied. Different patterns of such 
concentrations would require separate equa- 
tions to describe adequately each set of data, 
while, conceivably, one multiple linear regres- 
sion equation could adequately describe all 
such data sets. 

Therefore it is concluded on the basis of 
the present results that a multiple linear re- 
gression equation using separate ionic con- 
centrations as independent variables should 
provide the best method for establishing limit- 
ing standards for mineral content in domestic 
water. Consumer ratings of natural waters 
(Bruvold et al., 1967) will provide the data 
for establishing standards and for further 
evaluation of conclusions regarding indepen- 
dent anion effects based on this research. 
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Multiple stepwise correlation analyses were conducted on each of seven per- 
sonality inventories against the rated creativity of 62 architects from a nation- 
wide sample. In each analysis the best combination of three variables was 
identified. The equations derived from the analyses were then validated on a 
second sample of 62 architects. Initial multiple correlations ranged from .80 to 
.35; cross-validated coefficients ranged from .55 to .20. Specific cross-validated 


coefficients were as follows: 


Personality Inventory (MMPI), .20; 
42; Strong Vocational Interest Blank 
zey Study of Values (A-V-L), .38. 


An assessment study of a nationwide sam- 
ple of American architects (MacKinnon, 
1962, 1965) provided an opportunity to ex- 
amine the relative validity of a variety of 
personality inventories, using ratings of crea- 
tivity as a criterion. 

Architects were chosen for study on the 
grounds that they should clearly manifest 
those personality characteristics which are 
typical of the creative person. Behind this 
reasoning lies the observation that if an 
architect’s designs are to give delight, the 
architect must be an artist, and if they are 
to be technologically sound and efficiently 
planned, he must also be something of a scien- 
tist, at least an applied scientist or engineer. 
Yet surely it is not sufficient that an architect 
be both artist and scientist if he is to be 
highly successful in the practice of his pro- 
fession. He must also to some extent be busi- 
nessman, lawyer, advertiser, author-journalist, 
psychiatrist, educator, and psychologist. 

The total sample consisted of 124 American 
architects. Forty of them, constituting a na- 
tionwide sample and here designated as Archi- 
tects 1, were nominated by a panel of five 
professors of architecture at the University of 

1 This is a slightly modified version of a paper 
presented at the annual meeting of the Western Psy- 
chological Association, Honolulu, Hawaii, June 14-19, 
1965. 

2 Requests for reprints should be sent to Donald 
W. MacKinnon, Director, Institute of Personality 


Assessment and Research, University of California, 
2240 Piedmont Avenue, Berkeley, California 94720. 


Adjective Check List (ACL), .38; 
Psychological Inventory (CPI), .47; FIRO-B, 41; 


M 


California 
Minnesota Multiphasic 
yers-Briggs Type Indicator (MBTI), 


(SVIB), .55; and Allport-Vernon-Lind- 


California, Berkeley, for the unusual creative- 
ness they had shown in the practice of their 
profession. 

The second group, Architects 2, consisted 
of 43 architects chosen so as to match Archi- 
tects 1 with respect to age and the geographic 
location of their practice. Each of them met 
the additional requirement that he had had 
at least 2 yr. of work experience and associa- 
tion with one of the originally nominated 
creative architects. 

The third sample, Architects 3, was also 
chosen to match Architects 1 with respect to 
age and geographic location of practice, but, 
unlike Architects 2, the 41 men in this group 
had never worked with any of the Archi- 
LECiSaLs 

The three samples were selected in this 
manner in the hope of tapping a range of 
creative talent sufficiently wide to be fairly 
representative of the profession as a whole. 
To determine whether or not this objective 
was met, ratings on a 7-point scale of the 
creativity of all 124 architects were obtained 
from six groups of architects and architectual 
experts: the 5 members of the original nomi- 
nating panel at the University of California, 
19 professors of architecture distributed na- 
tionwide, 6 editors of the major American 
architectural journals, 32 Architects 1, 36 
Architects 2, and 28 Architects 3. 

The mean intercorrelation among these six 
rating groups provides an estimate of the re- 
liability of the ratings, namely, .84. The mean 


‘CREATIVITY AMONG ARCHITECTS 


ratings of creativity for the three groups are 
Architects 1, 5.46; Architects 2, 4.25; and 
Architects 3, 3.54. The differences are in the 
expected direction and are statistically sig- 
nificant (p < .001). In other words, the three 
groups do represent significantly different 
levels of creativeness. At the same time, it 
must be noted that the three samples show an 
overlap in their judged creativeness; that is, 
they are not discrete and discontinuous. When 
the three groups are combined, the ratings 
approximate a normal distribution of judged 
creativeness ranging from a low of 1.9 to a 
high of 6.5 on a 7-point rating scale. 

Of the seven inventories to be examined, 
the SVIB (Strong, 1959) clearly yielded the 
largest number of significant correlations with 
the criterion. Of the 57 scales which were 
scored, 40 correlated significantly (p < .05), 
ranging from Artist, with a coefficient of .59, 
to Banker, with a coefficient of —.66.° The 
5 scales showing the highest positive correla- 
tion with the criterion were Artist, .59; Au- 
thor-journalist, .54; Lawyer, .44; Advertising 
man, .42; Musician, 38. Those with the 
highest negative correlations were Banker, 
—.66; Office man, —.60; Accountant, —.54; 
Policeman, —.52; and Purchasing Agent, 
a0. 

In addition to this consideration of indi- 
vidual scales, all 57 scales of the SVIB were 
used to derive an optimum three-variable 
multiple regression equation, using as a cri- 
terion the architects’ rated creativity. 

In order that the multiple regression equa- 
tion might be cross-validated, the total sam- 
ple of 124 architects was divided into two 
subsamples of 62, matched on creativity. 
From the roster of Architects 1, 2, and 3, 
listed according to the rank order of their 
creativity within each sample, odd-numbered 
Ss were assigned to the sample on which the 


3A 2-page table giving correlations with rated 
creativity for the total sample of 124 architects of all 
variables scored on the 7 personality inventories ex- 
amined has been deposited with the National Aux- 
iliary Publications Service. Order Document No. 
00458 from National Auxiliary Publications Research 
and Microfilm Publication, Inc. Remit in advance 
$3.00 for photocopies or $1.00 for microfiche and 
make checks payable to: Service of the American 
Society for Information Service, Inc., 22 West 34th 
Street, New York, New York 10001. 
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TABLE 1 


RELATIONSHIP OF SEVEN PsycHOLOGICAL TrEsTs TO 
RATINGS OF CREATIVITY, IN Two 
SAMPLES OF ARCHITECTS 

















. 
Regres- R (Cross- 
Test ee (Initial | validat- 
coeffi- aiales) . 
Gents | S2mple ing : 
sample) 
SVIB (57 variables) 
Office man — 543 
Banker — .503 .802* sOOk” 
President-Mfg. 
Concern — .258 
CPI (18 variables) 
Sp 47 
Ac —1.015 re" 471" 
Fe 990 
MBTI 
E (extraversion) — .350 
N (intuition) 386 0a" AL7* 
P (perception) 956 
FIRO-B (6 variables) 
El — 1.847 
Ee .970 Ooi 406" 
we — 1.077 
A-V-L (6 variables) 
Economic — 786 
Social —.272 .640* 382* 
Religious — 181 
ACL (24 variables) 
Self-confidence 
(Gough key) — .302 
Autonomy (Heil- 
brun key) ele .607* vO" 
Change (Heilbrun 
key) 413 
MMPI (13 variables) 
Mt 980 
F 808 348* 197 
Pa (with K cor- 
rection) 718 
an = 62. 
bn = 62. 
*'p <= 01: 


multiple regression solution was derived, even- 
numbered Ss to the sample on which the equa- 
tion was cross-validated. 

The computational program which was em- 
ployed proceeds by steps, each of which 
yields a trial set of predictors and correspond- 
ing regression statistics. At each step, one 
variable is added or deleted to produce the 
next trial set. The choice of variable guaran- 
tees either a larger set which is significantly 
better, or a smaller set which is not signifi- 
cantly worse. Under this procedure the new 
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set is the best that can be obtained by a single 
addition or deletion. In order to insure com- 
parability of results across tests, an arbitrary 
decision was made to stop the analysis after 
three variables had been selected; thus, for 
each test, there would be an optimum three- 
variable equation.’ 

As Table 1 shows, the three SVIB variables 
selected for the multiple regression, all with 
negative weights, were the scales for Office 
Man, Banker, and President-Manufacturing 
Concern. The multiple regression coefficient 
between these three scales and the criterion 
of rated creativity was .80. In the cross- 
validating sample the correlation dropped to 
.55, a still highly significant value. 

In the case of the CPI (Gough, 1964), 11 
of the 18 scales had significant correlations 
with the architects’ rated creativity: Fe (fem- 
ininity), .24; Fx (flexibility), .24; Sa (self- 
acceptance), .19; Sp (social presence), .18; 
Cm (communality), —.31; Sc (self-control), 
—.31; Ac (achievement via conformance), 
—.24; Gi (good impression), —.23; To (tol- 
erance), —.21; Re (responsibility), —.20; 
and Wb (well-being), —.20. The three vari- 
ables selected in the multiple-regression solu- 
tion were Sp and Fe with positive weights, 
and Ac with a large negative weight. The 
initial validity of this CPI equation was .57 
which on cross-validation became .47. 

Of the eight scales of the MBTI (Myers, 
1962) which tests Jungian functions and at- 
titudes, four were significantly correlated with 
creativity in the total sample of architects: 
Intuition, .45; Perception, .40; Sensing, —.41; 
and Judgment, —.29. 

The three variables of the MBTTI selected 
by the IBM program for the multiple regres- 
sion equation were Extraversion with a nega- 
tive weight and Intuition and Perception, with 
positive weights. The equation correlated .53 
with the criterion in the first sample, and .42 
in cross-validation. 

On Schutz’s (1967) test of interpersonal be- 
havior, FIRO-B, four of the six scales were 
significantly correlated with the criterion: E14 


4 An empirical check with the SVIB, using 4-scale 
and 5-scale equations, revealed greater shrinkage in 
cross-validation than for the 3-scale equation. The 
3-scale combination, that is, has an additional ad- 
vantage of greater stability in cross-validation. 


(desire to include others in one’s activities) 
correlates —.44, W! (desire to be included in 
others activities) correlates -.26, W° (desire 
to be controlled by others) —.24, and E® 
(desire to control others) .34. 

The three scales selected by the computer- 
programmed multiple regression solution are 
E!' and W®, both with negative weights, and, 
with a positive weight, E°. The initial validity 
of the equation was .63, which on cross-valida- 
tion dropped to .41, again a still highly sig- 
nificant correlation. 

On the A-V-L (Allport, Vernon, & Lindzey, 
1960), three variables were significantly cor- 
related with the criterion in the total sample 
of 124: the theoretical value, .18; the aes- 
thetic value, 35; and the economic value, 
—.48. (For »= 124, a value of .18 is sig- 
nificant at the .05 level, a value of .23 sig- 
nificant at the .01 level.) 

However, the three A-V-L variables se- 
lected by the IBM program for the multiple 
regression equation were the economic, social, 
and religious values, all with negative weights. 
The initial correlation of the equation was 
.64 which on cross-validation dropped to .38. 
The cross-validated value is still significant, 
since with an m of 62 a value of .25 is sig- 
nificant at the .05 level, a value of .32 sig- 
nificant at the .01 level. 

The ACL (Gough & Heilbrun, 1965) may be 
scored for 24 variables, and scores on 14 of 
these scales were significantly correlated with 
the criterion in the total sample of architects. 
Those correlating positively were Change, 
.45; Exhibition, 36; Autonomy, .36; Aggres- 
sion, .34; Lability, .27, and Number of Un- 
favorable Adjectives Checked, .24; while those 
showing negative correlations were Self-con- 
trol, —.39; Deference, —.35; Personal Ad- 
justment, —.33; Order, —.31; Nurturance, 
—.30; Intraception, —.25; Affiliation, —.24; 
and Endurance, —.23. 

The three scales of the ACL selected for 
the multiple regression solution were Self- 
confidence with negative weighting, and Au- 
tonomy and Change both positively weighted. 
The equation correlated .61 with the criterion 
in the first sample and .38 in cross-validation. 

Of the seven inventories which were ad- 
ministered, the MMPI (Hathaway & Mc- 
Kinley, 1951) fared least well. In the total 
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TABLE 2 


MEANS, STANDARD DEVIATIONS, AND CORRELATIONS OF COMPUTED SCORES FOR Cross- 











Test M SD 
SVIB | CPI 

SVIB 440.1 87.1 
CPI 430.4 44,9 19 
MBTI 4.40.9 62,9 68" .G4* 
FIRO-B 428.8 56.9 52s A9 
A-V-L 438.6 78.0 10% 23 
ACL 452.2 48,2 54" 39" 
MMPI 447.9 36.6 RLS: mL 
*» <.01. i 


sample, only 4 of the 13 regular scales had 
significant correlations with the criterion: Mf 
(femininity) .29, F (validity) .28, Pd (psy- 
chopathic deviate) .23, and Sc (schizophre- 
nia) .20. 

The validity of the multiple regression 
equation for the MMPI was .35 in the first 
sample, but shrank to .20 (p > .10) on cross- 
validation. The three variables in the equa- 
tion were L (lie), F (validity), and Pa 
(Paranoia), all with positive weights. 

Table 2 presents means and standard devia- 
tions for the seven scores derived from the 
multiple regression solutions computed for 
the cross-validation sample of 62 architects 
together with the score intercorrelations. 

Of the seven tests whose validities have 
been reviewed, it is interesting to note that it 
is the SVIB, not always seen as a personality 
measure, which surpassed the others in its 
ability to forecast the rated creativity of the 
architects. 

It is likewise worth noting that the MMPI, 
which would be thought of by many to be an 
instrument of choice in the study of crea- 
tivity because of its relevance to psycho- 
pathology and ego dysfunctions having mo- 
tivational implications, is relatively weak. 

A problem with regression equations con- 
taining a large number of variables is that it 
is difficult to interpret them psychologically. 
In the present instance, using a three-variable 
solution, one is struck by the good psycho- 
logical sense that can be made of the several 
equations. For example, the CPI equation, 


VALIDATING SAMPLE OF 62 ArcHiTrEcts 


Correlations 


PIRO A-V-L | ACL | MMPI 


| MBTI | 
| | 
44 
56" 37" 
57* 37" 43* : 
2 20 344 06 


+ .547 Sp — 1.015 Ac + .990 Fe, emphasizes 
factors found repeatedly to characterize more 
creative persons. The spontaneity and _ self- 
confidence reflected in the Social Presence 
scale belies one stereotype of creativity, that 
of the socially anxious and ineffective misfit. 
A large negative weight given to Achievement 
via Conformance goes along with the fre- 
quently found lower score on this scale rela- 
tive to Achievement via Independence, which 
is a combination quite characteristic of both 
highly creative and successful persons in the 
professions. The Femininity scale weighting 
reflects an often repeated finding that our 
creative Ss reveal an openness to their own 
feelings and emotions, a sensitive intellect 
and understanding self-awareness, with widely 
ranging interests including many which in 
Western culture are thought of as feminine. 
Another example may be found in the 
MBTI equation, — .350 E+ .386 N + .556 
P. The creative architect tends to be less ° 
often extraverted in the Jungian sense. To 
oversimplify the case, the intense personal and 
social interaction required with many clients 
is not to his liking, and he would prefer time 
for contemplative thought and creative ac- 
tivity. However, as noted above for the CPI, 
one should not lose sight of the fact that he 
does interact with others with marked social 
presence, often with consummate skill. The 
creative architect prefers, as well, intuitive 
perception to the more prosaic and commonly 
found direct sensing and controlled, planned, 
and orderly judgmental approach to all ex- 
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perience. There is a decided preference for 
openness and receptivity, both to experience 
and new ideas, as well as for a concern with 
deeper meanings and possibilities inherent in 
things and situations. 

This paper has focused solely on concur- 
rent validities; one must recognize that a 
more important problem is whether or not, 
and to what extent, these personality inven- 
tories would demonstrate predictive validities 
(over time) of a comparable magnitude. This 
is an empirical question which only future 
research can answer, but the findings of this 
cross-sectional inquiry give some reason to 
hope that personality inventories may be 
proved to possess longitudinal as well as con- 
current utility. 
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VALIDITY, PREDICTIVE EFFICIENCY, AND PRACTICAL 


SIGNIFICANCE OF SELECTION TESTS 


ERVIN W. CURTIS! anp EDWARD F. ALF 


United States Naval Personnel Research Activity, San Diego 2 


Despite many warnings, validity coefficients continue to be accepted at face 
value as measures of practical significance. This practice is herein evaluated by 
examining each functional relationship between three indexes of predictive ef- 
ficiency—r, rv’, and E—and three measures of practical significance—the in- 
crease of the criterion mean, the expected proportion “satisfactory,” and the 
expected proportion in 10 criterion categories. The validity coefficient, r, is a 
linear function of the increase of the criterion mean and very nearly a linear 
function of the other two measures of practical significance; 7? and E are 
related to these three measures in a more curvilinear manner. A table is 
presented that gives the proportion expected in each of 18 criterion categories 
as a function of r and the selection ratio. 


Brogden (1946) demonstrated that a selec- 
tion test’s validity, 7, is a linear function of 
the difference between two criterion means: 
the mean for the group above the predictor 
cutoff and the mean for the population. Even 
more important, he showed that r equals 
the proportion improvement over chance that 
is possible with each selection ratio. Although 
these facts are not mentioned in most of the 
textbooks in the area of personnel selection 
(Dunnette, 1966; Ghiselli & Brown, 1948; 
Guilford, 1954, 1965; Guion, 1965; Horst, 
1966; Nunnally, 1959; Thorndike, 1949), 
their importance is underscored by the perva- 
sive tendency of psychologists to consider the 
validity coefficient not just as a measure of 
correlation, but as a measure of practical sig- 
nificance. Thus it is common practice to con- 
sider the correlation between a selection test 
and a criterion as an indication of the value 
of the test for the institution using the test. 
Brogden’s analysis shows that this is a rea- 
sonable thing to do, provided that the in- 
crease of the criterion mean is most important 
to the institution. An unanswered question is: 
How does ¢ relate to other measures of prac- 
tical significance? 

The facts that Brogden demonstrated also 
bear on another question that has not been 
adequately discussed in the psychological 

1 Requests for reprints should be sent to Ervin W. 
Curtis, 8853 Alpine, La Mesa, California 92041. 

2The contents of this paper do not necessarily 


represent the official position or policy of the Depart- 
ment of the Navy. 


literature: Which index of predictive efficiency, 
r,7, or E(E=1—V1i—?), is the best in- 
dex of practical significance? If the increase of 
the criterion mean is a linear function of 7, it 
cannot be a linear function of r? or E. There- 
fore, in terms of this measure of practical sig- 
nificance, 7 is the better index. However, there 
are other measures of practical significance 
that are functions of 7, 7?, and E. Which index 
of predictive efficiency is the best index’ of 
practical significance for each measure’ of 
practical significance? 

This paper explores the functional relation- 
ship between each index of predictive effici- 
ency and each of three measures of practical 
significance: (a) the increase of the criterion 
mean, (b) proportion satisfactory from the 
Taylor-Russell tables (1939), and (c) propor- 
tions in 10 criterion categories. 





INCREASE OF THE CRITERION MEAN 


Brogden (1946) related r to the increase of 
the criterion mean due to the selection test. 
He used the standard-score formula for r to 
show that 

Zs M s—M P 


rv as a 


ee mas Me AE p Ee 


where 


Msgis the criterion mean of the selected group 

My» is the criterion mean of the population 

My is the criterion mean of the upper tail of 
the criterion distribution equal in num- 
ber to the selected group 
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Frc, 1. Correlation surface and criterion distribution showing the means of the 
“high” group, the selected group, and the population. 


Zs is the z score of Mg 
Zy is the z score of My. 


These means are shown in Figure 1. The un- 
familiar one, My, is the mean of the selected 
group when the selection test is a perfect 
predictor of the criterion, that is, when r 
equals 1.00. This mean provides the upper 
limit for My, the mean of the selected group. 
The lower limit is Mp if negative correlation 
is ruled out. 

The numerator in Formula 1, Mg — Mp, is 
the actual increase of the criterion mean, while 
the denominator is the largest possible in- 
crease. Therefore, 7 equals the proportion of 
possible increase actually achieved. For ex- 
ample, an r of .05 indicates that the test 
provides 5% of the improvement over chance 
that a perfect test would provide; an r of .50, 
50%; anr of .95, 95%; etc. 

Cross-multiplying Formula 1 yields 


r(My — Mp) = Msg — Mp. [2] 


Since My — Mp is a constant when the pro- 
portion selected is held constant, Mg — Mp is 
a linear function of 7. Figure 2 shows this 
linear relationship. The curves for r? and 
are shown for comparison. 

Thus, Brogden showed that r is a linear 
function of the difference between the crite- 


rion means of the selected group and the 
population. This implies that the units of the 
r scale have equal value for the institution 
using the test, which is in sharp contrast to 
the implications of 7? and #. They imply that 
the units at the high end of the 7 scale are 
much more important than the units at the 
low end. For example, a little computation 
shows that EF increases 30 times as much 
when 7 increases from .90 to .95 as when 7 
increases from .05 to .10. In contrast, Brogden 
showed that the criterion mean increases 
equally in the two cases. 


PROPORTION SATISFACTORY 


The mean rise in criterion scores is not the 
only important effect of selection tests upon 
criterion distributions. Another effect is the 
change in the proportion of selectees above a 
critical point on the criterion—a point that 
has special significance for the institution 
using the test. Taylor and Russell (1939) 
provided tables that give the proportion of 
selectees expected to be “satisfactory” on the 
criterion for any combination of 7, base rate, 
and selection ratio. 

With “proportion satisfactory” on the 
ordinate, graphs like Figure 2 were drawn 
using data in the Taylor-Russell tables. A 
set of curves for 7, 7’, and E was plotted for 
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Fic. 2. Increase of the criterion mean as a function of r, r’, and E. 


each of the 121 combinations of base rate and 
selection ratio in the Taylor-Russell tables. 
When + varies above .70, which is rare in 
selection testing, there is no consistent differ- 
ence in terms of linearity for r, r’, and £. In 
contrast, when 7 varies below .70, the curve 
for r in most of the 121 sets of curves is de- 
cidedly more linear than the curve for r* and 
E. Usually, the set looks much like the set in 
Figure 2. 

As a rigorous test of relative linearity, the 
correlation between each of the three indexes 
and proportion satisfactory was computed. 
Fifteen curves were not tested because they 
are nearly horizontal—that is, proportion sat- 
isfactory changes by three or less points be- 
tween 7 values of zero and 1.0. Of the 106 
curves tested, 101 cases yielded a higher cor- 
relation for r than for r* or EH, two yielded 
equal correlations, and three yielded lower 
correlations for r. Therefore, proportion satis- 
factory is more nearly a linear function of r 
than of r® or E. 

The curves relating r and proportion satis- 
factory are very nearly linear when r varies 


between zero and .70. The correlation between 
y and the proportion satisfactory is .95 or 
greater for 114 of the 121 curves. The seven 
exceptions have extreme selection ratios and 
proportion satisfactory values of .90 or .95. 

Thus, when proportion satisfactory is the 
measure of practical significance, the 7 scale 
is more meaningful for selection test evalua- 
tion than either the 7? or the £ scale. 


PROPORTIONS IN FINER CRITERION 
CATEGORIES 


The Taylor-Russell tables are not directly 
applicable when there are more than two im- 
portant criterion categories, which is usually 
true when the criterion is school grades or 
supervisor ratings. When this is the case, the 
proportion of selectees expected in each cate- 
gory can be computed using the United 
States Department of Commerce (1959) tables 
of the bivariate normal distribution. Each 
combination of selection ratio and r yields a 
set of expected proportions. 

Table 1 presents the results of the com- 
putations for 18 criterion categories. It gives 
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TABLE 1 
PROPORTION Exprerep IN 18 CRITERION CATEGORIES FOR GIVEN VALUES OF ¢ AND SELECTION RATIO 





Criterion : De Assan. 3 wee! 
categories 
0 | a | a | oe | A | o 6 | ai | 8 | 9 | 1.0 


Selection ratio: .0287 


% O11 017 031 045 066 094 129 174 233 307 oTe 


2.3= 09 
1,9-2,3 018 028 038 059 077 101 132 167 216 296 627 
1.4-1,9 052 073 098 125 157 188 223 261 289 296 000 
1.0-1,4 078 101 122 143 164 181 188 185 160 084 000 
0,7-1,0 O83 098 115 122 129 129 119 098 059 014 000 
0,5-0,7 066 077 080 084 080 073 063 045 024 004 000 
0,3-0,5 074 077 080 077 073 063 049 028 O11 000 000 
0,1-0,3 078 O80 077 073 (066 052 038 021 004 000 000 
0,0-0,1 040 (42 038 035 028 021 O11 007 004 000 000 
=(),1-0,0 040 042 035 031 028 017 O11 007 004 000 000 


=(),3-—(),1 078 070 066 052 042 028 017 004: 000 000 000 
=0,5-—0,3 074 066 056 045 031 021 O11 004 000 000 000 
0,7-—0,5 066 059 045 035 021 O14 004 004 000 000 000 
1,0-—0,7 083 066 049 035 021 011 004 000 000 000 000 
1.4-—1.0 078 056 042 024 O14 004 004 000 000 000 000 
1,9-—1,4 O52 035 021 O11 007 004 000 000 000 000 000 
2.3-— 1,9 O18 Ol 004 004 000 000 000 000 000 000 000 
w=— 2,3 O11 007 003 000 000 000 000 000 000 000 000 

















Selection ratio; .0808 





























23-00 O11 017 025 |fF035 047 062 078 097 115 130 132e6 
1,9-2,3 018 025 035 047 059 074 094 116 147 189 223 
1,4-1,9 052 069 088 108 132 160 191 229 277 353 645 
1,0-1,4 078 095 114 132 151 171 188 207 220 219 000 
0.7-1,0 083 095 108 119 128 134 137 132 119 077 000 
0,5-0,7 066 074 079 082 084. 083 079 071 051 020 000 
0,3-0,5 074 078 080 083 080 076 067 053 033 009 000 
0.1-0,3 078 079 079 077 072 064 053 038 019 003 000 
0,0-0,1 040 040 038 036 033 029 021 011 006 000 000 
-0,1-0,0 040 038 037 033 030 025 019 011 004 000 000 
0,3-—0,1 078 074 068 061 052 041 027 015 005 000 000 

0.5-—0,3 O74 068 059 O51 041 031 019 009 001 000 000 | 
0,7-—0,5 066 058 051 041 030 020 O11 005 001 000 000 
1,0-—0,7 O83 O71 056 042 030 019 010 003 |. 000 000 000 
1.4-—1,0 078 061 046 032 020 011 004 001 000 000 000 
1,9=— 1,4 052 037 026 016 009 005 001 000 000 000 000 
2.3-—1,9 O18 012 007 004 003 001 000 000 000 000 000 
Am~—=23 O11 006 004. 001 000 000 000 000 000 000 000 

Selection ratio: .1587 

23-0 Oll 016 021 028 035 043 052 059 0065 067 067 
1,9-2.3 018 024 031 040 049 059 070 083 098 110 113 
1,4-1,9 052 066 O81 096 115 136 159 188 224 276 328 
1,0-1,4 078 092 107 123 139 156 176 199 228 274 491 
0,7-1,0 O83 094 104 113 123 132 141 149° | 155 154 000 
0,5-0,7 066 073 078 082 O86 O88 089 O88 081 062 000 
0,3-0,5 074. 078 O80 083 083 083 079 073 060 033 000 
0,1-0,3 078 080 O81 O80 078 073 067 OS7 040 015 000 
0,0-0,1 040 040 040 038 036 033 028 022 014 003 000 
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Criterion — 
categories Sore. 7 ao eye se 
0 oil 2 | 3 | 4 | 5 6 7 | 8 | 9 1.0 
Selection ratio; .1587 
—(.1-0.0 040 039 038 036 033 030 025 019 O11 002 O00 
—0.3-—0.1 078 076 O71 066 059 050 040 ()27 013 002 000 
—0.5-—0.3 074 069 062 056 048 039 028 017 006 001 000 
—0.7-—0.5 066 060 053 045 037 028 019 010 003 000 000 
—1.0-—0.7 083 073 062 050 038 027 016 007 001 000 000 
—1.4-—1.0 078 064 050 038 027 016 008 003 001 000 000 
—1,9-—1.4 052 040 029 020 012 006 002 O01 000 000 000 
—2.3-—1.9 018 013 009 005 003 001 001 001 000 000 000 
— »-—23 O11 007 004 002 001 000 000 000 000 000 000 
Selection ratio: .2420 
2.3—20 O11 014 019 024 029 033 038 042 044 044 044. 
1.9-2.3 018 023 029 035 041 049 056 063 070 074 074 
1.4-1.9 052 063 075 088 103 118 136 156 180 205 215 
1.0-1.4 078 090 103 115 129 145 162 184 212 255 322 
0.7-1.0 083 092 101 110 119 129 139 151 167 191 344. 
0.5-0.7 066 072 076 O81 O85 088 092 096 098 098 000 
0.3-0.5 074 077 080 082 084 086 086 085 080 067 000 
0.1-0.3 078 080 081 O81 O81 079 076 070 060 038 000 
0.0-0.1 040 040 040 039 038 036 033 029 022 O11 000 
—0.1-0.0 040 039 039 037 036 034 030 025 018 007 000 
—0.3-—0.1 078 076 072 069 064. O57 049 039 025 007 000 
—0.5-—0.3 074 070 065 060 053 046 037 026 014 003 000 
—0.7-—0.5 066 061 055 049 042 034 026 016 007 001 000 
—1.0-—0.7 083 074 065 055 044 034 023 012 004 000 000 
—1.4-—1.0 078 066 055 043 032 022 012 005 001 000 000 
—1.9-—1.4 052 042 031 023 015 009 004 001 000 000 000 
—2.3-—1.9 018 013 010 006 004. 002 001 000 000 000 000 
— w—— 1 3 O11 007 005 002 001 000 000 000 000 000 000 
Selection ratio: .3085 
2.3—00 O11 014 018 021 025 029 O31 034 035 035 035 
1.9-2.3 018 022 027 032 037 043 048 053 057 058 058 
1.4-1.9 052 062 072 083 095 108 122 137 152 166 169 
1.0-1.4 078 089 100 111 124 137 152 171 195 227 253 
0.7-1.0 083 091 099 108 116 125 136 149 166 195 270 
0.5-0.7 066 071 075 079 084. 088 093 098 105 115 216 
0.3-0.5 074 077 O80 082 085 087 089 090 091 089 000 
0.1-0.3 078 080 081 O81 082 082 O80 078 073 059 000 
0.0-0.1 040 040 040 040 039 038 036 033 029 019 000 
—(.1-0.0 040 040 039 038 037 035 028 029 024 014 000 
—().3-—0.1 078 076 074 071 067 062 055 054 034 015 000 
—0.5-—0.3 074 070 066 061 056 050 043 026 021 007 OOO 
—0.7-—0.5 066 062 057 052 045 039 031 022 O11 002 000 
—1.0-—0.7 083 075 067 058 049 039 028 017 007 001 000 
—1.4-—1.0 078 067 057 047 037 026 017 008 002 000 000 
—1.9-—1.4 052 043 034 026 018 O11 006 002 000 000 000 
—2.3-—1.9 018 014 010 007 004 002 001 000 000 000 000 
— 0-—23 O11 008 005 003 002 001 000 000 000 000 000 
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Table 1—(Continued) 





























7 
Criterion ae = = 
categories 
0 | 1 | 2 | 3 4 | 5 | 6 Bil 8 9 1.0 
Selection ratio: .3821 

2.3—00 O11 014 016 020 022 025 026 027 028 028 028 
1.9-2.3 018 022 026 030 034 038 041 045 046 047 047 
1.4-1.9 052 061 069 079 088 098 109 119 129 136 136 
1.0-1.4 078 087 097 107 117 129 142 157 175 195 204 
0.7-1.0 083 090 098 105 fut) 121 131 143 160 185 218 
0.5-0.7 066 070 O74 079 082 087 092 099 107 123 174 
0.3-0.5 074 076 079 082 085 087 091 094 099 107 193 
0.1-0.3 078 080 081 082 083 084 084 084 084 080 000 
0.0-0.1 040 040 040 040 040 039 038 037 035 029 000 
—0.1-0.0 040 040 039 039 038 037 036 034 030 022 000 
—0.3-—0.1 078 076 075 072 069 066 061 055 046 028 000 
—0.5-—0.3 074 071 067 064 059 054 048 040 029 013 000 
—0.7-—0.5 066 062 058 053 049 043 036 028 017 005 000 
—1.0-—0.7 083 076 069 062 053 044 034 023 012 002 000 
—1.4-—1.0 078 069 059 050 041 031 021 012 004. 000 000 


—1,.9-—1.4 052 044 036 028 021 014 008 003 001 000 000 
et) 018 014 O11 008 005 003 001 001 000 000 000 
—0-—2.3 O11 008 006 004 002 001 000 000 000 000 000 








Selection ratio: .4602 











2.3-« O11 013 016 018 020 021 023 OS |) ORs 023 023 
GES) 018 021 024 028 031 034 036 038 039 039 039 
1.4-1.9 052 059 067 074 082 090 097 105 110 113 113 
1.0-1.4 078 086 094 103 112 121 132 143 156 167 169 
0.7-1.0 083 089 096 102 109 117 126 136 150 169 181 
0.5-0.7 066 070 073 077 O81 086 091 097 107 122 145 
0.3-0.5 074 076 079 081 084 087 091 096 103 115 160 
0.1-0.3 078 077 081 082 083 085 087 088 092 097 170 
0.0-0.1 040 040 040 040 040 040 040 040 040 039 000 
—0.1-0.0 040 040 039 039 039 038 038 037 035 032 000 
—0.3-—0.1 078 077 075 073 072 069 067 063 057 045 000 
—0.5-—0.3 074 071 068 065 062 059 054 048 039 024 000 
—0.7-—0.5 066 063 060 056 O51 047 041 034 024 O11 000 


—1.0-—0.7 083 077 071 064 057 050 041 030 018 005 000 
—1.4-—1.0 078 070 062 054 045 036 027 017 007 001 600 
eel 052 045 038 031 024 017 010 005 001 000 000 
—2)3-—1.9 018 015 012 009 006 004 002 001 000 000 000 
—0-—2.3 O11 008 006 004 002 001 000 000 000 000 000 






































Selection ratio: .5398 
2.3—0 O11 013 015 016 018 019 019 020 020 020 020 
1.9-2.3 018 021 023 026 028 030 032 033 033 033 033 
1.4-1.9 052 058 065 070 076 082 088 092 095 097 097 
1.0-1.4 078 O85 092 099 106 ele 130 138 144 144 
0.7-1.0 083 089 094 100 105 112 120 128 139 150 154 
0.5-0.7 066 070 072 076 080 083 088 094 | 102 114 123 
0.3-0.5 074 076 078 082 083 087 091 096 103 116 136 
0.1-0.3 078 079 081 O81 084 086 088 091 097 107 145 
0.0-0.1 040 040 040 040 041 041 042 042 044 047 074 
—(.1-0.0 040 040 040 040 040 039 040 040 040 041 074 
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Table 1— (Continued) 





Criterion 
categories 





Dal Ieee |e | coe 





Selection ratio: .5398 








| 
——0.5-—(). 1. 078 | 077 076 075 074 072 071 | 069 067 062 000 
a Oia .S. 074 | 072 069 067 065 062 059 055 049 038 000 
Oia 066 064 061 057 054 050 046 040 032 020 000 
Seine (7 083 078 073 | 067 061 055 047 038 026 O11 000 
te ——1.() 078 071 064 057 049 041 032 022 012 002 000 
—1:9-—1.4 052 046 040 033 027 020 014 007 003 000 000 
eel 018 015 013 010 007 005 003 001 000 000 000 
— o-—72.3 O11 009 006 005 003 003 O01 000 000 000 000 





Selection ratio: .6179 











2.3-0 O11 | 012 014 015 016 O17 O17 017 017 017 017 
OS 2ES 018 020 022 024 026 027 028 029 029 029 029 
1.4-1.9 052 057 062 067 072 076 080 082 084 084 084 
1.0-1.4 078 084 089 095 101 107 113 119 124 126 126 
0.7-1.0 083 088 092 097 102 108 114 120 128 134 135 
0.5-0.7 066 069 072 075 078 081 086 091 097 105 108 
0.3-0.5 074 075 078 080 083 086 089 095 101 111 119 
0.1-0.3 078 079 080 082 084 086 089 092 098 109 126 
0.,0-0.1 040 040 040 041 041 042 042 044 046 051 064 
—0.1-0.0 040 040 040 040 040 040 041 042 043 046 064 
=U =O 078 077 076 076 075 075 074 074 075 077 126 
—0.5=—0:3 074 072 070 069 067 065 063 061 058 053 000 


=0/-—0)5 066 064 062 059 057 054 O51 047 041 032 000 
—1:0-—0:7 083 079 074 070 065 060 054 046 036 020 000 
4 —— 1.0) 078 072 066 060 054 046 038 029 018 006 000 
—1.9-—1.4 052 047 041 036 030 024 017 O11 004 001 000 
— 2.5 —-— 19 018 016 013 O11 008 006 004 002 001 000 000 

—20-—2,3 O11 009 007 005 004 002 001 000 000 000 000 








Selection ratio: .6915 











2.3-0 O11 012 013 014 015 015 015 015 015 015 015 
19235 018 020 022 023 024 025 026 026 026 026 026 
1:4-1.9 052 056 060 064 067 070 073 075 075 075 075 
1.0-1.4 078 083 087 092 096 101 105 109 112 113 113 
0.7-1.0 083 087 091 095 099 103 108 113 117 120 121 
0.5-0.7 066 069 071 073 076 079 082 087 091 095 096 
0.3-0.5 074 075 077 079 081 084 088 092 097 104 106 
Q.1-0.3 078 079 080 081 083 085 088 092 098 106 113 
0.0-0.1 040 040 040 041 041 042 043 045 047 052 058 
—0.1-0.0 040 040 040 040 040 041 042 043 045 049 058 
—0.3-—0.1 078 077 077 077 076 077 077 078 081 087 113 
Sato —(0.5 O74 072 O71 070 069 068 067 066 066 067 106 


—0.7-—0.5 066 064 063 061 059 057 055 052 049 045 000 
—1.0-—0.7 083 080 076 073 069 065 060 054 047 034 000 
——1.4-—1.0 078 073 068 063 058 052 045 036 026 012 000 
—1.9-—1.4 052 048 043 038 033 027 021 015 008 001 000 
—2.3-—1.9 018 016 014 012 009 007 005 003 001 000 000 

— 0-—23 O11 009 008 006 004 003 001 000 000 000 000 
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Table 1—(Continued) 














é 

Criterion 

categories 
0 | pl | oo So 4 | ES) | 6 aah 8 | 9 | 1.0 
Selection ratio: .7580 

2.3—00 O11 012 013 013 014 014 014 014 014 014 014 
1.9-2.3 018 020 021 022 023 023 024 024 024 024 024 
1.4-1.9 052 055 059 061 064 066 067 068 069 069 069 
1.0-1.4 078 082 085 089 093 096 099 101 102 103 103 
0.7-1.0 083 086 089 093 096 099 103 106 109 110 110 
0.5-0.7 066 068 070 072 074 077 080 083 086 088 088 
0.3-0.5 074 075 076 078 080 083 085 089 093 096 097 
0.1-0.3 078 079 080 O81 083 085 087 091 095 101 103 
0.0-0.1 040 040 040 041 041 042 043 045 047 050 053 
—0.1-0.0 040 040 040 040 041 O41 042 043 045 049 053 
—0.3-—0.1 078 078 077 077 O77 078 079 081 084 091 103 
—0.5-—0.3 074 072 072 O71 070 070 070 070 072 076 097 
—0.7-—0.5 066 065 064 062 061 060 058 057 057 057 088 
—1.0-—0.7 083 O81 078 O75 072 069 065 062 057 049 000 
—1.4-—1.0 078 074 070 066 062 057 051 044 035 021 000 
—1.9-—1.4 052 049 045 041 036 031 025 019 O11 003 000 
—2.3-—1.9 018 016 015 014 O11 008 006 004 002 000 000 
—o-—23 O11 010 008 006 005 003 002 001 000 000 000 




















Selection ratio: .8413 





2.3—00 O11 O11 012 012 013 013 013 013 013 013 013 
1.9-2.3 018 019 020 020 021 021 021 021 021 021 021 
1.4-1.9 052 054 057 058 060 061 062 062 062 062 062 
1.0-1.4 078 081 083 086 088 090 091 092 093 093 093 
0.7-1.0 083 085 087 090 092 094. 096 098 099 099 099 
0.5-0.7 066 068 069 071 072 074 076 077 079 079 079 
0.3-0.5 074 075 076 077 079 080 082 084 086 087 088 
0.1-0.3 078 079 079 080 082 083 085 088 090 093 093 
0.0-0.1 040 040 040 041 041 042 043 044 045 047 047 
—0.1-0.0 040 040 040 040 041 041 042 043 045 047 047 


—0:3=—O0:1 078 078 078 078 078 079 080 082 085 090 093 
—0.5-—0.3 074 073 072 072 072 072 073 074 076 081 | 088 
=0:/=—0'5 066 065 064 064 063 062 062 063 064 067 079 
—1.0-—0.7 083 081 079 078 076 074 072 071 070 070 099 
—1.4-—1.0 078 075 072 069 066 063 059 055 050 041 600 
—1.9-—1.4 052 O50 047 044 040 036 032 026 020 010 000 
—2.3-—1.9 018 O17 016 014 012 010 008 006 003 O01 000 
—«0-—2.3 O11 010 009 007 006 004 003 002 000 000 000 


Selection ratio: .9192 





2.3- 2 O11 O11 O11 012 012 012 012 012 012 012 012 
1.9-2.3 018 019 019 019 019 020 020 020 020 020 020 
1.4-1.9 052 053 054 055 056 057 057 057 057 057 057 
1.0-1.4 078 079 081 082 083 084 | 084 085 085 085 085 
0.7-1.0 083 084 086 087 088 089 090 090 091 091 091 
0.5-0.7 066 067 068 069 070 071 071 O72 9,2 072 072 072 
0.3-0.5 074 074 075 076 077 078 078 079 080 080 080 
0.1-0.3 078 078 079 080 080 081 083 084 085 085 085 
0.0-0.1 040 040 040 040 041 041 042 042 043 043 043 








—0.1-0.0 040 040 040 040 040 041 041 042 043 043 043 
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Criterion . oy 

categories ee 7 wes ars. 

0 | 1 | 2 | 3 4 | As | 6 | i | 8 2 1.0 
Selection ratio: .9192 
| | 
—(0.3-—0.1 078 078 078 078 079 079 080 082 083 085 085 
—0.5-—0.3 074 073 073 073 073 073 074 075 077 079 080 
—0.7-—0.5 066 066 065 065 065 065 065 066 068 071 072 
—1.0-—0.7 083 082 081 080 079 079 079 079 080 084 091 
—1.4-—1.0 078 076 075 073 072 070 068 067 065 066 085 
—1,9-—1.4 052 O51 049 047 045 043 040 037 032 026 000 
—2.3-—1.9 018 017 O17 015 014 013 011 009 007 003 000 
—n-—2,3 O11 010 009 009 008 006 005 003 002 000 000 
Selection ratio: .9713 

2.3-0 O11 O11 O11 O11 O11 O11 011 011 011 O11 O11 
1.9-2.3 018 018 018 018 019 019 019 019 019 019 019 
1.4-1.9 052 053 053 053 053 054 054 054 054 054 054 
1.0-1.4 078 079 079 080 080 080 080 080 080 080 080 
0.7-1.0 083 084 084 085 085 086 0&6 086 086 086 086 
0.5-0.7 066 067 067 067 068 068 068 068 069 069 069 
0.3-0.5 074. 074 074 074 075 075 076 076 076 076 076 
0.1-0.3 078 078 079 079 079 080 080 080 080 080 080 
0.0-0.1 040 040 040 040 040 041 041 041 041 041 041 
—0,1-0.0 040 040 040 040 040 040 041 041 041 041 041 
—0.3-—0.1 078 079 078 078 079 079 079 080 080 080 080 
—0.5-—0.3 074 073 074 074. 074 074 074 075 076 076 076 
—0.7-—0.5 066 066 066 066 066 066 067 067 068 068 069 
—1.0-—0.7 083 083 082 082 082 082 082 083 084 085 086 
—1.4-—1.0 078 077 077 076 075 075 075 075 076 078 080 
—1.9-—1.4 052 052 051 050 049 048 047 046 045 045 054 
—2,.3-—1.9 018 018 017 017 016 016 015 014 012 010 000 
—o-—2.3 011 O11 010 010 009 008 007 006 004 002 000 











the proportions expected in each criterion 
category when 7 is .0, .1, .2, .3, .4, .5, .6, .7, 
8, .9, or 1.0. The criterion categories are 
listed in the first column. Each sigma value 
that separates two categories is exact to the 
tenths digit, and therefore required no inter- 
polation in the bivariate tables. The sigma 
values divide the normal curve into areas 
whose proportions are given in the second 
column where r is zero. With the exception of 
2.3 and 0.0, these sigma values were also 
used on the selection test scale to define the 
cutoffs and, therefore, selection ratios. The 
Table consists of 14 sections, one for each 
selection ratio.® 


® Although the selection of sigma values in the 
bivariate tables was somewhat arbitrary, an effort 











Table 1 shows the practical significance of 
specific changes in 7 and the selection ratio. 
In general, as 7 increases, the proportions in 
higher criterion categories increase, while the 
proportions in low criterion categories de- 
crease. To compare 7, 7°, and EL, some of the 
rows were combined, yielding 10 standard 
deviation categories with the following bound- 
ariesss— G0, — 1:9, = 1.4,--1,0, —0.5, 0.0, 0.5, 
1.0, 1.4, 1.9, and «. With 14 selection ratios 
under consideration, there are now 140 rela- 
tionships between 7 and proportion expected. 


was made to make Table 1 most useful by choosing 
sigma values that (a) divide the normal curve into 
a large number of categories with practically signifi- 
cant proportions, and (b) represent selection ratios 
that round to even percentages with a maximum 
error of 2/10 of 1%. 
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The linearity of each relationship below an 
ry of .70 was estimated by computing the cor- 
relation between 7 and proportion expected. 
lor 129 of the 140 relationships the correla- 
tion between r and proportion expected is 
equal to or greater than .95, which indicates 
a high degree of linearity. When 7 was trans- 
formed to r?, the correlation decreased in 110 
of the cases, was unchanged in 8 cases, and in- 
creased in 22 cases. The correlation for the H 
transformation of 7 decreased in 111 of the 
cases. Thus, proportion expected is more 
nearly a linear function of r than of 7° or E. 
Moreover, 16 of the 22 cases in which 7? is 
more linear than 7 pertain to the two cate- 
gories separated by the mean—the two least 
likely to be of practical significance in applied 
situations. 

Since proportion expected in criterion cate- 
gories is more nearly a linear function of 7 
than it is of v7? or H, 7 is the more useful index 
for this measure of practical significance. 


Discussion 


Although Table 1 gives the proportion of 
selectees expected in 18 categories, expected 
proportions in larger categories can be de- 
termined by adding vertically adjacent num- 
bers in the table. It is even possible to de- 
termine the proportion for a dichotomy by 
adding the numbers above or below a category 
boundary in Table 1 and subtracting the sum 
from 1.00. Eighteen dichotomies are possible, 
whereas only 11 are possible using the Taylor- 
Russell tables. 

Another use of the table is to forecast the 
practical significance of specific changes in r 
and/or the selection ratio. By referring to the 
table, one can see in each criterion category 
the expected consequences of improving the 
test, that is, raising 7 a certain amount, or 
the expected consequences of changing the 
selection ratio a certain amount. Similarly, if 
a shorter test is desired even though a lower 
ry can be expected, one can see the practical 
consequences of a specific reduction in 7. 

Table 1 can also be used to estimate the 
probability that observed data are from a 
bivariate normal population, an assumption 
that is often necessary when interpreting sam- 
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ple statistics. Obtained proportions in the 
criterion categories can be compared to the 
expected proportions in Table 1 using the x’ 
statistic (Guilford, 1965, p. 246). If the com- 
puted x’ value is statistically significant, this 
is evidence that either the sample is biased 
or the population is not bivariate normal. This 
is especially important when correcting r for 
restriction of the range since most procedures 
for correcting 7 assume bivariate normality 
in the population and a random sample from 
the correlation surface above the selection 
test cutoff. 

One limitation must be placed on the con- 
clusions in this paper regarding the relative 
utility of r, 7’, and E as indexes of practical 
significance: The means and proportions that 
indicate practical significance pertain only to 
selectees, that is, applicants above the selec- 
tion test cutoff. The criterion distribution, 
whether actual or hypothetical, below this 
cutoff is assumed to have no practical signifi- 
cance for the institution using the selection 
test. When the criterion is dichotomous, this 
assumption is equivalent to assigning a zero 
utility to two of the four decision-outcome 
combinations in the fourfold table, namely to 
reject-unsatisfactory and_ reject-satisfactory 
(“false positives”). This is consistent with 
the Cronbach and Gleser (1965) treatment 
of selection testing. 
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GROUP AND INDIVIDUAL EFFECTS IN PROBLEM SOLVING* 


GEORGE S. ROTTER ? 


Montclair State College 


AND STEPHEN M. PORTUGAL 


Long Island University 


It was hypothesized that when working on a single problem, the combina- 
tion of individual and group sessions would lead to more solutions than only 
individual or only group sessions. Dividing 128 male and female Ss into 32 
real and nominal work groups, the hypothesis was not confirmed. Instead, 
the individual production of ideas was found to be superior to either group 
production or the combination of group and individual production (p< .05). 
In general, the production of ideas appears to be simply related to the propor- 
tion of time spent working alone. Possible explanations are discussed and areas 


for future research are presented. 


In. the ongoing debate over the relative 
merits of group and individual problem solv- 
ing, Taylor, Berry, and Block (1958) made 
a fundamental methodological contribution 
when they divided into nominal four-person 
groups the separate performances of 48 indi- 
viduals. After eliminating duplicate answers 
to problems administered under brainstorm- 
ing conditions, Osborn’s (1953, pp. 297-307), 
findings indicated that the nominal groups, 
that is, individual performance, were clearly 
superior to an equal number of real interact- 
ing groups in the production of ideas. Dun- 
nette, Campbell, and Jaastad (1963) and 
Meadow, Parnes, and Reese (1959) had the 
same Ss engage in both group and individual 
sessions and their results replicated essentially 
those of Taylor et al. 

The explanations for the finding usually are 
phrased in terms of an inhibitory effect 
created by being surrounded by other people 
or that the public expression of an idea in a 
group causes all the members to think along 
the same lines, leading to the duplication of 
ideas. It is posited, however, that individual 
and group conditions in brainstorming or 
problem solving in general have their own 
unique contributions to offer. Dunnette, 
Campbell, and Jaastad (1963) did have the 
same Ss work under both group and individual 
circumstances. Their findings are limited, how- 
ever, because the shift in working conditions 


1 This paper is based, in part, on a Master’s thesis 
by the author submitted to Long Island University. 

2 Requests for reprints should be sent to George S. 
Rotter, Psychology Department, Montclair State 
College, Upper Montclair, New Jersey 07043. 
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was accompanied by a shift in problems. The 
present study allows its Ss to work on the 
same problem under both individual and group 
conditions. 

Specifically, it is hypothesized that the 
number of solutions to a problem should be 
greater where S works in both a group and an 
individual setting than where he works in 
either a group or an individual setting. Be- 
cause of the assumed and generally reported 
superiority of individual over group condi- 
tions in problem solving, it is hypothesized 
further that for all conditions the number of 
solutions to a problem should be greater when- 
working individually than when working with 
a group. 


MrtTHOD 


Subjects 


Recruited from introductory psychology courses at 
Long Island University were 64 males and 64 fe- 
males. Aside from their sex and the restrictions im- 
posed by the hours they were available, assignment 
to the various conditions was random. 


Materials 


Two problems, each having numerous answers, 
were utilized. 

Tourist problem. Each year a great many Ameri- 
can tourists go to visit Europe. But now suppose 
that our country wished to get many more European 
tourists to come to visit America during their vaca- 
tions. What. steps can you suggest to get more 
European tourists to come to this country? 

Education problem. Because of the rapidly in- 
creasing birthrate beginning in the 1940s, it is now 
clear that by 1980 public school enrollment will be 
very much greater than it is today. In fact, it has 
been estimated that if the student-teacher ratio were 
to be maintained at what it is today, 50% of all in- 
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dividuals graduating from college would have to be 
induced to enter teaching. What different steps might 
be taken to insure that schools will continue to pro- 
vide instruction at least equal in effectiveness to that 
now provided? 

Both of these problems were originally used by 
Taylor et al. (1958), the only change being the 
revision of the dates in the second problem so that 
they were appropriate to the present time. 


Experimental Design and Procedure 


Four experimental conditions were utilized, each 
composed of eight same-sex groups of four members: 
Condition I, working individually; Condition G, 
working with a group; Condition I-G, working in- 
diivdually for the first half of the session and then 
working with a group; Condition G-I, working with 
a group for the first half and then working in- 
dividually. Within each condition four groups con- 
sisted of males only and four consisted of females 
only. Within these subdivisions, two groups worked 
on the Tourist problem only while the other two 
groups worked on the Education problem only. 

Individual Conditions. Ss were told this was an 
experiment on creative thinking. They were read 
one of the problems twice and told to write down all 
the ideas, that is, solutions, they could think of. 
Quantity, not quality, was stressed. They were then 
sent to separate cubicles where they worked on their 
answers privately for 16 min. 

Group Conditions. As above, the Ss were told this 
was an experiment on creative thinking. They were 
read one of the problems twice and told to write 
down all the ideas they could think of and heard. 
Again, quantity, not quality, was stressed and they 
were urged to avoid any manifestation of criticism 
or ridicule of ideas. All answers were listed by each 
S on his own sheet of paper. In this instance, the 
four-person group sat around a large table while 
discussing their problem for 16 min. 

Individual then Group. The Ss were initially given 
instructions identical to those given Ss in the I con- 
dition. The exception was that they were permitted 
only 8 min. to work privately on their assigned 
problem. After this they were reassembled and given 
instructions identical to those administered to Ss in 
Condition G. They were also asked not to raise 
before the group the ideas produced privately. They 
then proceded to work as an interacting group for an 
additional 8 min. They used the same problem previ- 
ously assigned to them as individuals. 

Group then Individual. The Ss were initially given 
instructions identical to those given Ss in Condition 
G with the exception they were permitted only 8 
min. to work on the problem. Following this they 
were told they would now work on the same prob- 
lem separately and were given instructions identical 
to those provided in Condition I. They were asked 
not to use any of the same ideas raised during the 
group session and were sent to their respective 
cubicles for an additional 8 min. 

All questions were answered by reiterating relevant 
portions of the instructions or by politely dismissing 
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TABLE 1 


FacTORIAL ANALYSIS OF VARIANCE 
oF NUMBER OF IDEAS 














Source df MS F 
Conditions (A) 3 585.875 5.02* 
Problems (B) 1 50.000 
Sexes (C) 1 990.125 8.49* 
AXB 3 390.917 Spoon 
Bee 1 13.500 
AXC 3 53.208 
De BOSC 3 186.417 1.60 
Error 16 116.688 

xp << «05, 
irrelevant inquiries. Both groups and_ individuals 


were asked to write down all ideas produced so as 
to prevent groups from having an advantage over 
individuals of having more time to think, an objec- 


tion raised by Zagona, Willis, and MacKinnon 
(1966) to previous studies. 
RESULTS 


Scoring 


For each group of four Ss, whether real or 
nominal, a single list was made of every re- 
sponse. If any idea was given more than once, 
the better-worded one was scored. 


Individual versus Group Solutions 


A factorial analysis of variance (Table 1) 
shows that the different experimental condi- 
tions of problem solving led to significantly 
different rates in the production of ideas. 
Based on the means shown in Table 2, the 
combination of the two mixed conditions (G-I 
and I-G) produced significantly more solu- 
tions (X = 53.5) than did Condition G (t = ° 
Peto Ul = 2h oP Ano. Gomirary "to ex- 
pectations, however, Condition I resulted in 
a significantly greater number of ideas than 
the combination of the mixed conditions (¢ 
=2.21> dj—22- p< 05). Hence, the hy- 
pothesis that the number of solutions to a 
problem should be greater where S works in 
both a group and an individual setting than 
where he works in either a group or an in- 
dividual setting was not supported. 

As hypothesized, individual problem solv- 
ing (I) was more productive in the generation 
of different solutions than was group problem 
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TABLE 2 


MEAN NuMBER OF IDEAS PRODUCED UNDER EACH CONDITION OF GROUP AND 





INDIVIDUAL PROBLEM SOLVING 





Mean of group 


Corrected® mean 


Mean of individual |} Corrected mean of 





Cana nea) phase of group phase phase individual phase 
I 63.8 (63.8) 
G 42.9 (42.9) 
LG 52.6 19.7 39.4 33.0 66.0 
G-I 54.5 19.0 38.0 O55) 71.0 





® Doubled to compensate for half-time. 
b Same as the observed mean. 


solving (G) (Tables 23e7 — 93:57 "dy —sl4s 
pee.02)e 

In a similar fashion, within Conditions I-G 
and G-I the individual sessions produced 
more solutions than the group sessions (Table 
2). Since each part of the mixed conditions 
lasted 8 min., or half the time for the pure 
conditions, the mean values of the individual 
and group phases, taken separately, were 
doubled. As seen in Table 2, these means 
parallel closely the means for Conditions 
I and G. 


Other Findings 


The source of the significant interaction be- 
tween problem type and problem solving con- 
ditions (Table 1) can be traced to the mixed 
conditions. As seen in Table 3, for the Tourist 
problem Condition G-I led to more solutions 
than Condition I-G; conversely, for the Edu- 
cation problem Condition I-G led to more 
solutions than Condition G-I. By itself, prob- 
lem type did not produce any effect. 

As also seen in Table 3, males produced 
consistently significantly more ideas than fe- 


TABLE 3 


MEAN PropucTION OF IDEAS UNDER CONDITION, 
SEX, AND PROBLEM TYPE 




















Tourist problem | Education problem 
Con- 
dition 
| Male | Female} x | Male | Female| x 
I 78.0 | 52.0 | Oy) Coplesy |) sis) §)) CAS 
G 37.0 | 34.0 -| 35.5) 58.0 | 42.5 | 50.2 
LG 52.0 | 40.5 | 46.2) 71.0 | 47.0 | 59.0 
G-I 66.5 | 57.5 | 62.0) 48.0 | 46.0 | 47.0 


males. This difference did not vary from con- 
dition to condition or from problem type to 
problem type, as indicated by the very low 
F ratios for those interactions involving sex 
(Table 1). 


DIscUSSION 


The findings confirm the hypothesis that 
the individual production of ideas is superior 
to production within a group setting. This 
holds true whether individual work precedes, 
follows, or is independent of group work. 

It was also posited that individual and 
group settings could make their separate and 
unique contributions to the problem-solving 
process. However, the hypothesis that mixed 
conditions should lead to greater productivity 
than pure conditions was not supported. 
While the mixed conditions did produce more 
solutions than the group condition, they pro- 
duced fewer solutions than the individual 
condition. 

In explanation, it can be seen that the 
combined mean of both mixed conditions 
within each problem type is almost at the 
midpoint between the means for Conditions 
I and G (Table 3). Since each of the mixed 
conditions was divided into two equal parts— 
one group and the other individual—one may 
argue that the production of ideas is simply 
a function of the proportion of time spent in 
an individual situation. In other words, the 
mixed conditions were superior to the group 
condition not because they allowed a com- 
bination of different working conditions but 
because they contained a period of individual 
problem solving. Similarly, since Condition I 
permitted twice as much time for individual 
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work as the mixed conditions, its production 
was greater. 

Why do this and other studies (e.g., Dun- 
nette et al., 1963; Taylor et al., 1958) con- 
tinue to find group brainstorming inferior to 
working individually? As mentioned earlier, 
one explanation holds that group discussion 
channels thinking along similar lines. Another 
suggests that only those ideas which are so- 
cially acceptable will be voiced within a group 
context. 

As an alternative explanation, perhaps 
group participation has been placed at a dis- 
advantage because only relatively short time 
intervals, ranging from 5 to 16 min., have 
been utilized in research. Hence, only the 
initial spurts in thinking are being tapped. In 
group sessions most participants might not 
have time to record their own spurts while 
listening to and recording those of the others 
and they may become temporarily forgotten. 
Allowing longer group sessions might permit 
the recall of these items and, hence, lead to 


the increased production of ideas. It also fol- 
lows that for group conditions, the production 
of responses would decrease less over time 
than the production of ideas under individual 
conditions. 
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SOME RESULTS OF THREE BASIC SKILLS TRAINING 
PROGRAMS IN AN INDUSTRIAL SETTING 


WILLIAM G. MOLLENKOPF 1 


The Procter & Gamble Company 


Test scores obtained before and after instruction were analyzed to evaluate 
outcomes of three types of training. The training programs were designed to 
improve basic skills of present and prospective employees in production, office, 
and laboratory work. In most of the learning situations, groups made signifi- 
cant gains. Individual true gains also were studied, using a method developed 


by Lord. 


In recent years, a number of industrial 
firms have offered to employees or prospective 
employees new training programs intended to 
improve basic skills such as reading and 
arithmetic (Burck, 1968; Gassler, 1967; 
Gustaitis, 1967; Janger, 1969). Such pro- 
grams have particular relevance for those 
who lack adequate opportunity for good edu- 
cation. 

The following reports describe three types 
of training programs, carried out in Procter 
& Gamble in 1967-68, oriented toward (a) 
production jobs, (6) typing and secretarial 
work in offices, and (c) technician positions 
in laboratories. Following description of the 
training, test data are presented which reflect 
in part the outcomes of participation in the 
program. These group data are supplemented 
by information about individual gains, pre- 
sented in a later section. 


PRODUCTION EMPLOYEE GROUPS 


The MIND (Methods of Intellectual De- 
velopment) program (Gustaitis, 1967) was 
used with groups at two manufacturing 
plants. This program consisted of two parts, 
Communications Skills and Mathematics. The 
first part included instruction on vocabulary, 


1 Requests for reprints should be sent to the 
author, The Procter & Gamble Company, P. O. Box 
599, Cincinnati, Ohio 45201. 
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word building with prefixes and _ suffixes, 
reading aloud, spelling, and grammar. This 
instruction was given largely through the use 
of workbooks. The Mathematics training was 
given principally through the use of tapes 
which presented problems of increasing com- 
plexity at faster and faster rates. Both parts 
provided some opportunity for adaptation to 
individual differences. Instruction was given 
before or after the work shift, that is, on the 
employee’s own time. Costs of instruction 
were borne by the employer. 


Plant A 


At this plant, 17 men participated in part 
or all of the MIND program. Of those who 
continued through the program, 10 took both 
Communications Skills and Mathematics, 4 
took Mathematics only, and 1 took Com- 
munications Skills only. Two dropped out 
during the course. Ages of the participants 
ranged from 21 to 47, with a median of 29. 
Years of formal schooling reported varied 
from 6 to 14; the median was 12. About one- 
third of the men were Negroes. 

Instruction was given in two 2-hr. sessions 
per week over a 20-wk. period, on a total of 
78 days. Trainees, other than those who 
dropped out, varied in number of days at- 
tended, from a high of 72 to a low of 33, 
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with a median of 66. Shift rotation was a 
frequent cause of nonattendance. 

Parallel forms of the Paragraph Meaning 
and Arithmetic Computation sections of the 
Stanford Achievement Test, Intermediate II 
Battery (Kelley, Madden, Gardner, & Rud- 
man, 1964) were given prior to and following 
completion of the MIND program to provide 
information on changes in skill levels as a 
result of the training. 

Paragraph meaning. Preinstruction grade- 
equivalent scores for the 11 men who took 
the Communications Skills training ranged 
from 5.0 to 12.9, with a mean of 8.5 and a 
median of 8.0. After-instruction scores ranged 
from 5.6 to 11.5, with a mean of 9.4 and a 
median of 10.6. The difference between the 
means did not reach significance at the .05 
levell(f=21.33;50) —=9)s 

Arithmetic computation. Grade-equivalent 
scores obtained prior to the beginning of in- 
struction by the 14 men who participated in 
training in this area ranged from 6.0 to 11.7, 
with a mean of 8.2 and a median of 7.9. At 
the end of instruction, scores ranged from 
10.5 to 12.9, with a mean of 12.1 and a 
median of 12.6. More than half of the group 
had a postcourse score at or very close to the 
maximum grade score obtainable (12.9), so 
there was a definite ceiling effect here. The 
difference between means was significant be- 
yond the .01 level (¢ = 3.27, df = 11). 


Plant B 


At this plant, 29 men participated in the 
program. Their ages ranged from 20 to 52, 
with a median of 37. Years of school reported 
as completed ranged from 5 through 12, with 
a median of 10. About two-thirds of the men 
were Negroes. 

Instruction was offered in three 2-hr. ses- 
sions per week, over a 20-wk. period. Each 
participant could have received 116 hours of 
instruction had he attended all sessions. How- 
ever, actual hours for the 27 men for whom 
this information was available ranged from a 
low of 14 to a high of 94. Nine persons 
dropped out; 1 man started late. 

At this plant, parallel forms of the Reading 
and Arithmetic Computation Tests of ABLE 
(Adult Basic Learning Examination; Karl- 
sen, Madden, & Gardner, 1967) were given 
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before and at the end of the MIND program. 
Besides the nine dropouts and the man who 
began late, there were three other persons for 
whom test records were incomplete. 

Reading. Preinstruction grade-equivalent 
scores ranged from 4.1 to 9.0+ (the top score 
obtainable), with a mean of 7.4 and a median 
of 8.1. Two were at the top score. Scores at 
the end of instruction ranged from 4.7 to 
9.0+, with a mean of 8.0 and a median of 
8.7. Six of the group attained the top score; 
again, a definite ceiling effect existed here. 
The difference between means was significant 
at the .05 level (¢ = 2.85, df = 14). 

Arithmetic computation. Preinstruction 
grade-equivalent scores ranged from 3.9 to 
6.9, with a mean of 5.2 and a median of 5.1, 
scores clearly and appreciably lower than 
those for Reading. Scores after instruction 
ranged from 3.0 to 9.0+, with a mean of 6.8 
and a median of 7.3. Four men achieved the 
top score. The difference between means was 
significant at the .05 level (¢ = 2.88, df= 
14). 


OFFICE WORKER GROUPS 


During the 2 years, four small groups of 
women were given special training in order 
that they might be better prepared for secre- 
tarial and clerical positions in the main of- 
fices and technical centers of the firm. All 
participants were Negroes. Most were recent 
high school graduates, and single. A total of 
43 women began the training, and all but 2 
completed it. Most of these were offered, and 
accepted, regular office positions. 

Among the instructional topics, four con- 
tent areas were given principal attention: 
spelling, grammar, vocabulary, and arithme- 
tic. In spelling, emphasis was placed on words 
frequently misspelled in office work. Numer- 
ous informal oral quizzes were used, and stu- 
dents were encouraged to read broadly and 
use the dictionary to build their vocabularies. 

For instruction in grammar, a programmed 
text was used with three groups, but aban- 
doned as not sufficiently suitable for the 
fourth. With the fourth group, emphasis was 
placed on conversational. grammar. 

For improving arithmetic skills, large num- 
bers of problems involving basic operations 
were given, and difficulty and time pressure 
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were gradually increased. Word problems 
were introduced to give practice in thought- 
ful application of number skills. 

Arrangements for the instruction varied 
somewhat from group to group, but typically 
involved two or three sessions per week over a 
period of several weeks, with a total of about 
60 hr. Participants were paid as part-time 
employees. 

Pre- and posttest scores were available for 
most participants completing the program 
from administration of two parallel forms of 
a five-part test devised by the firm’s person- 
nel research group. Areas covered by this 
test were spelling, expression (grammar), fil- 
ing, arithmetic, and reasoning. 

Spelling. The test used presents a list of 30 
frequently misspelled words, about half of 
which are spelled correctly. The task is to 
indicate whether the word is spelled cor- 
rectly or not. Pretest scores had a mean of 
20.6 and a median of 20.0; posttest scores 
had a mean of 22.7 and a median of 24.0. 
The difference between the pre- and post- 
instruction means was significant at the .01 
level (¢ = 3.46, df = 38). 

Expression. The 15 items in this test in- 
volve incomplete sentences. The task is to 
choose words or punctuation marks which 
best complete the sentences in terms of good 
usage. The mean of the before-training scores 
was 8.6, and the median was 8.5. For the 
after-training scores, the mean was 9.2 and 
the median 9.0. The difference between means 
was significant at the .05 level (¢ = 2.30, df 
= 39). 

Filing. Fifty names of firms are given, and 
the task is to “file” these in the appropriate 
simulated file pockets, working as rapidly and 
accurately as possible. The before-training 
scores had a mean of 28.6 and a median of 
29.0; the mean of the after-training scores 
was 34.8 and the median was 36.0. The dif- 
ference between means was significant at the 
01 level (¢ = 3.82, df = 39). 

Arithmetic. The test used consists of 60 
short problems in addition, subtraction, mul- 
tiplication, and division. Answers are provided 
and are to be marked as right or wrong. The 
mean of the before-training scores was 25.2 
and the median, 26.0. For after-training 
scores, the mean was 30.3 and the median 
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was 30.5. The difference between means was 
significant at the .01 level (¢ = 4.81, df= 
39). 

Reasoning. This test consists of 10 verbal 
analogies items and 10 arithmetic reasoning 
items. Before-training scores had a mean of 
8.1 and a median of 8.0. For after-training 
scores, the mean was 10.0 and the median was 
10.0. The difference between means was sig- 
nificant at the .01 level (¢ = 3.75, df = 39). 


LABORATORY TECHNICIAN TRAINING GROUP 


This program was designed to offer train- 
ing to persons potentially qualifiable for 
employment as laboratory technicians, but 
lacking the present levels of competence ordi- 
narily sought in areas such as elementary 
mathematics, basic chemistry, and English 
usage. 

Twenty-six individuals began the training 
program. Nineteen were male, 7 female. All 
but one were high school graduates or the 
equivalent. Most of the participants were 
Negroes. 

Instruction in mathematics and English 
placed strong emphasis on use of workbooks. 
The two instructors also utilized many other 
resources: books on communication skills and 
general mathematics; dictionaries; periodi- 
cals such as Science News, Readers’ Digest 
and the New York Times; audiovisual ma- 
terials and equipment; and commercially de- 
veloped programs concerned with improve- 
ment of reading, listening, and study skills. 

After an initial orientation period, partici- 
pants spent approximately half their time in 
the classroom activities described above and 
the other half in on-the-job training in labora- . 
tory situations, learning more about the work 
of a technician. Class participants were em- 
ployed with the understanding that satisfac- 
tory completion of this classroom—job-train- 
ing-program would lead to regular employ- 
ment as a technician trainee. 

Twenty-five of the classroom participants 
completed the formal instructional program, 
and all of these were offered regular employ- 
ment as technician trainees. 

Learning measures. Throughout the course 
of the training period quizzes were frequently 
used. However, in only one instance was 
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TABLE 1 
INDIVIDUAL GAINS 
Estimated true individual gains 
Study Test 

pa Ss B Percent 
Range Median | showing 
+ gains 

Plant A Paragraph Meaning (grade equivalent score) ils) 1k@) =] SH oo 73 

Arithmetic Computation (grade-equivalent score) +1.7 to +5.1 +4.2 100 

Plant B Arithmetic Computation (grade-equivalent score) —.9 to +3.0 +1.8 81 

Office group | Spelling (raw score) — 1.6 to --6.2 see) 87 

Filing (raw score) —11.8 to +22.2 +7.2 90 

Numerical Operations (raw score) —2.7 to +12.1 +5.0 98 

Reasoning (raw score) —.3 to +5.0 +1.7 95 











there a before-and-after type of measure- 
ment. 

During the orientation period, the students 
were given a 100-item test on arithmetic 
processes and manipulation of whole numbers, 
fractions, decimals, percentages, ratios and 
proportions, and formulas. Scores ranged from 
36 to 90, with a mean of 61.5. Seven weeks 
later, the same test was administered again 
without prior notice. The nature of the con- 
tent was such that in the intervening time 
many similar problems would have been en- 
countered, so recall of the original answers 
would be quite unlikely. For every one of the 
25 students taking the test again, an increase 
in score occurred, ranging from half a point 
(by one of the top scorers on the first admin- 
istration) to 294 points (by the lowest scorer 
on the first attempt). Final scores varied from 
45 to 98, with a mean of 75.2. The difference 
between means was significant at the .01 
level (6 = 4.28;:df = 24), 


INDIVIDUAL GAINS 


In addition to changes in mean scores, it is 
of interest to assess the gains made by indi- 
viduals during the course of instruction. Often 
the observed change in test score is treated 
as a gain. But Lord (1956, 1958, 1963) and 
others have pointed out that the use of the 
observed score difference as a measure of gain 
may be quite misleading. Both regression 
effects and errors of measurement need to be 
taken into account. 

For several of the sets of plant and office 
data, it was possible to apply the regression 


equation approach developed by Lord (1956) 
which assumes the standard errors of mea- 
surement are the same for the pre- and post- 
test scores. Table 1 summarizes the results 
obtained for those situations in which this 
approach fit the data well enough to yield 
meaningful outcomes. 

For the first three rows of this table, gains 
are stated in terms of grade-equivalent scores 
which give the reader a feeling of the magni- 
tude of the individual true gains estimated. 
For the office group, it may help to note that 
the median gain reported was more than half 
the magnitude of the standard deviation of 
the pretest scores in each instance. 


DISCUSSION 


For each of the three programs—produc- 
tion employee, office worker, and technician 
trainee—the test results for the group or 
groups involved showed an increase in average 
score over the training period with the change 
in mean significant at or beyond the .05 
level in all but one instance, that for the 
smallest group. 

Methods of instruction varied from one 
program to another. With the differences in 
content and students involved, comparison of 
the relative effectiveness of the instructional 
approaches is not possible. What can be said 
is that each instructional method was associ- 
ated with improvement in the group average 
in each instance. 

Considering that the production workers 
and office trainees typically spent well under 
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100 hr. in the training program, and that each 
program had at least two major emphases, 
the gains made seem impressive. Results pre- 
sented here are evidence that workers, some 
well along in years, can make sizable gains in 
basic skills such as arithmetic. Unfortunately, 
relatively little good data are as yet available 
on the outcome of such special training pro- 
grams. So the gains observed in the situations 
reported on here carinot be measured against 
reasonable expectations for outcomes from 
part-time instructional programs of several 
weeks to several months duration. Such ex- 
pectations can be built up only from a number 
of reports from various sources. There is a 
strong need for more data of the type reported 
here. 
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RELATIONSHIP OF THE COMPONENTS OF AN ASSESSMENT 
CENTER TO MANAGEMENT SUCCESS 


HERBERT B. WOLLOWICK anv W. J. MCNAMARA 1 
IBM, Armonk, New York 


The purpose of this study was to determine the validity of an assessment- 
center approach in predicting management potential and to determine the 
relative value of the components of the program. Results indicate that the 
approach is valid and that situational tests add to the predictiveness of 
paper-and-pencil tests. Also demonstrated was greater predictiveness through 
statistical combination of the program variables, rather than a subjectively 


derived overall rating. 


Because of the ever-increasing need for 
capable managers, many companies are taking 
a critical look at their traditional methods of 
selecting individuals for managerial assign- 
ments. Further, there is a heightened interest 
in identifying the potential manager early in 
his career so that he may be properly groomed 
for high-level managerial responsibilities. The 
techniques developed by the Standard Oil 
Company of New Jersey (Laurent, 1961) 
clearly demonstrate the possibilities of early 
identification of mangerial talent utilizing 
paper-and-pencil tests and inventories. 

Another approach, which involves the use 
of situational tests (Flanagan, 1954) in ad- 
dition to written objective tests, has been 
adopted recently by several business organi- 
zations. The most extensive use of this type of 
assessment procedure for identifying mana- 
gerial talent has been by AT&T (Bray & 
Grant, 1966). Merits of this method have 
also been demonstrated by Albrecht, Glaser, 
and Marks (1964). 

One purpose of this paper is to present 
additional data on the validity of the assess- 
ment-center technique based on its use in a 
large electronics concern. The data available, 
however, are extensive enough to make possi- 
ble the investigation of other crucial ques- 
tions relative to the assessment center. One 
question frequently raised concerns the ad- 
vantages of the situational tests utilized in 
the assessment-center technique over and 
above the use of written (paper-and-pencil) 
instruments alone. Stated in another way, the 

1 Requests for reprints should be sent to Walter 
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question is how much additional information 
can be gained by the use of group and indi- 
vidual situational tests. These are the most 
costly aspects of the assessment-center ap- 
proach. If additional validity is obtained 
from the situational exercises, is the addi- 
tional gain sufficiently large to justify the 
time and cost involved? 

Another question relates to the length and 
complexity of the assessment-center approach. 
How many tests are necessary, how many 
individual exercises should be utilized, and 
how many group exercises should be a part 
of the program? A further question relates to 
the emphasis that should be placed on the 
various elements of the program. In the final 
evaluation of an individual participant in an 
assessment-center program, how much weight 
should be given to the written tests, how 
much weight should be given to the situational 
exercises, and more specifically, how much 
weight should be given to each of the indi- 
vidual written tests and situational exercises 
in the assessment program? 


METHOD 


The Ss in the study were 94 men from two divi- 
sions of a large electronics firm. They were all 
from lower and middle management positions and 
participated in the assessment program between 
September, 1964, and June, 1965. The men selected 
to participate in the program had been designated 
as having above-average potential for advance- 
ment; they were not a random selection of men at 
their level in the business. Most of the men were in 
their late 20s or early 30s, and their educational 
level was approximately 2 yr. beyond high school. 

The criterion used was the increase in managerial 
responsibility as of January, 1968, approximately 3 
yr. after participation in the assessment program. 
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The measure of managerial responsibility used was 
position-code level, a two-digit number that is de- 
termined by a job analysis of the position, taking 
into account the number of people supervised, com- 
plexity of the job, financial responsibility involved, 
skill required, etc. Thus, two individuals at the 
same level of management could have different posi- 
tion-code levels depending on the managerial re- 
sponsibility involved in their job. This criterion was 
used rather than merely the level of management 
attained, because it is a finer measure of advance- 
ment. Thus, in this study a scale of 12 increments or 
management steps was involved. 

The two-day assessment program was very simi- 
lar to that described by Bray and Grant (1966), and 
the group situational exercises were identical to 
those described by Greenwood and McNamara 
(1967). 

The written tests used in the assessment program 
included cognitive ability tests, personality inven- 
tories, measures of leadership ability, and back- 
ground history. The specific tests and subscores 
were (a) Gordon Personal Profile—four scores: 
Ascendancy, Responsibility, Emotional Stability, and 
Sociability. (b) Gordon Personal Inventory—four 
scores: Cautiousness, Original Thinking, Personal 
Relations, and Vigor. (c) Fleishman’s Leadership 
Questionnaire (LOQ)—two scores: Structure and 
Consideration. (d) Background and Contemporary 
Data Form (BCD)—an in-company developed bio- 
graphical inventory scored with two keys: one for 
self-confidence and one for characteristics of a suc- 
cessful manager. (e) Otis Employment Test—total 
score. (f) School and College Ability Test—total 
score combining verbal and arithmetic sections. 

The group exercises were 

1. Leaderless Group Discussion: Each participant 
is required to make a 5-min. oral presentation of a 
candidate for promotion and then subsequently de- 
fend his candidate in a group discussion with five 
other participants. Characteristics rated are ag- 
gressiveness, persuasiveness or selling ability, oral 
communications, self-confidence, resistance to stress, 
energy level, and interpersonal contact. 

2. Manufacturing: Six participants are required to 
work together as a group and operate a manufac- 
turing company in an effective manner. They are 
required to purchase material, manufacture a prod- 
uct, and sell it back to the market. Included in the 
exercise is a product forecast, specific prices for raw 
materials, and completed products which fluctuate 
during the exercise. The participants are not given 
assigned roles. Both verbal and physical activity are 
involved. Characteristics rated are aggressiveness, 
persuasiveness or selling ability, resistance to stress, 
energy level, interpersonal contact, administrative 
ability, and risk-taking. 

3. Task Force: Participants are given data from 
which they are to form a group recommendation to 
the president of a company regarding which of 
three alternative courses of action he should take 
for expansion of the company’s activities. Character- 
istics rated are aggressiveness, persuasiveness or sell- 
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ing ability, oral communications, self-confidence, 
energy level, decision-making, and interpersonal con- 
tact. 

The individual exercises included 

1. In-Basket: Each individual is required to as- 
sume the position of a personnel manager in a 
manufacturing plant for one hour and a half. In 
this time, he must deal with internal and external 
correspondence which has accumulated since his 
predecessor was assigned to another position. For 
about 30 min., the participants are individually 
interviewed by an observer regarding the actions 
that they have taken and the reasons for their 
decisions. Characteristics rated are oral communica- 
tions, planning and organizing, self-confidence, 
written communications, decision making, risk taking, 
and administrative ability. 

2. Job Environment Report: This exercise requires 
the individual to describe in writing his job, the 
things he likes about his work, the things he dis- 
likes, and his relationship to his peers, subordinates, 
and supervisors. The participant is allowed 1 hr. to 
complete this exercise. Characteristics rated are 
planning and organizing, self-confidence, written 
communications, and interpersonal contact. 

3. Stock Market: The participant is given the 
role of an investor. He buys and sells stocks from a 
set money reserve under conditions of changing 
market values. The individuals write their own buy 
and sell orders, calculating their expenditures and 
profits. Characteristics rated are planning and or- 
ganizing, self-confidence, resistance to stress, de- 
cision making, administrative ability, and risk-taking. 

In addition to the scores on the written tests, 
several ratings for each participant in the program 
were obtained— 

1. Overall Rating (OAR): At the conclusion of 
the 2-day assessment program, the four observers 
who formed the assessment staff and who were all 
operating management personnel at least two levels 
above the participants, discussed each participant’s 
performance and assigned him an OAR, taking 
into consideration all of the variables in the pro- 
gram. The OAR was on a 5-point scale: (a) ex- 
ceptional potential for advancement, (b) above- 
average potential, (c) average potential, (d) below- 
average potential, and (e) no potential. 

2. Exercise Ratings: In the group’s exercises, each 
of the three observers rated the six participants on 
a 5-point scale: (a) much more effective than most 
of the group, (b) somewhat more effective, (c) 
about as effective as most of the group, (d) rela- 
tively ineffective, and (e) ineffective. The three ob- 
server ratings in each group exercise were averaged 
to produce the participant’s rating for the exercise. 
For the individual exercises a participant’s rating 
was based on a single observer. Reliability of the 
observer ratings in this program has been described 
in a previous paper by Greenwood and McNamara 
(1967). 

3. Characteristic Ratings: 12 characteristics 
were designated as important elements of mana- 
gerial performance: Self-Confidence, Written Com- 
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munications, Administrative Ability, Interpersonal 
Contact, Energy Level, Decision Making, Resistance 
to Stress, Planning and Organizing, Persuasiveness 
or Selling Ability, Aggressiveness, Risk Taking, and 
Oral Communications. 

The specific characteristics (usually 4-7) that 
were displayed in a particular exercise (group or 
individual) were rated on a 5-point scale by a single 
observer. Then the ratings from the several exercises 
were averaged to obtain a final characteristic score. 


RESULTS 


In Table 1 are the correlations of the 33 
variables (OAR, 14 test scores, 6 exercise 











TABLE 1 
CORRELATIONS WITH CHANGE IN PosITION LEVEL 
(N =94) 
Variable R 
Tests 
GPP 
Ascendancy soy 
Responsibility eral 
Emotional Stability —.18 
Sociability oe 
GPI 
Cautiousness —.05 
Original Thinking 05 
Personal Relations —.17 
Vigor Heyes 
LOQ 
Structure mls 
Consideration —.11 
SCAT—Total ml 
Otis—Total 07 
BCD 
Success 14 
Self-Confidence Prox? 
Exercises 
Manufacturing coors 
Leaderless ROR 
Task Force 5 
In-Basket EO ai 
Job Environment 07 
Stock Market —.07 
Characteristics 
Self-Confidence one 
Written Communications 2O** 
Administrative Ability 02 
Interpersonal Contact 00 
Energy Level 200 
Decision Making 29%" 
Resistance to Stress p20an 
Planning and Organizing Piss 
Persuasiveness a2 
Aggressiveness .24* 
Risk Taking edd 
Oral Communications 22 
noi 


Overall rating 


f AC 20 and shove) A 


Di <.05K( 
** b <.01 (.25 and above 
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ratings, and 12 characteristic ratings) 
with change in position level. From this 
table it will be noted that the OAR cor- 
relates .37 with the criterion (significant 
at the .01 level). However, one of the test 
scores, GPP, Ascendancy, correlates .39 with 
the criterion. 

As a matter of fact, three of the eight 
scales of the Gordon tests are significantly 
related to the criterion. However, neither the 
mental ability tests nor the LOQ are signifi- 
cantly related to the criterion measure. 

Two of the three group exercises (Leader- 
less and Manufacturing) provided significant 
correlations, in addition to one of the indi- 
vidual exercises (In-Basket). 

Of the 12 characteristics, 9 had significant 
correlations with the criterion. From a factor 
analysis of the characteristics measured in the 
assessment program, McNamara and Green- 
wood (1967) identified 5 characteristics as 
belonging to an activity dimension (Self- 
Confidence, Energy Level, Persuasiveness, 
Aggressiveness, and Oral Communications). 
It will be noted that all 5 of these character- 
istics are significantly related to the criterion. 
Of the test scores included in the program, it 
will also be noted that Ascendancy, Vigor, 
and Self-Confidence from the BCD are all 
significantly related to the criterion. Thus, it 
is apparent that for this population, an ac- 
tivity factor, however measured, is  signifi- 
cantly related to success in management. 

Table 2 shows multiple Rs for each type 
of predictor (i.e., tests, exercises, character- 
istics). Multiple Rs are also shown for all 
possible combinations of tests, exercises, and 
characteristics. For each multiple R, the per- 
centage of variance (R’) is also given. The 
figures shown are cut off at the point where 
the next variable did not add significantly to 
the correlation (F test each term). The table 
shows a multiple R of .39 for exercises alone, 
which accounts for 15% of the criterion vari- 
ance and a multiple R of .45 for tests alone, 
which accounts for 20% of the criterion 
variance. Combining tests with exercises and 
characteristics increases the multiple R to 
.62, which accounts for 38% of the criterion 
variance. In comparison, the OAR correlated 
37 with the criterion, which accounts for 
14% of the criterion variance. 
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TABLE 2 


MuttreLe Rs AND R? FOR THE THREE TYPES OF 
PREDICTORS SEPARATELY AND COMBINED 











Predictors R R? 
Tests 45 20 
Exercises 39 15 
Characteristics 41 23 
Tests & exercises 54 29 
Characteristics & exercises ay) 27 
Characteristics & tests Eo 30 


Tests, characteristics, & exercises .62 38 





Table 3 details the variables that contribute 
significantly to each multiple R in stepwise 
order. It is particularly interesting to note the 
order of the variables in the multiple R for 
all three types of measurement combined. 
First there is a test variable, then an exercise 
variable, then a characteristic variable, a test 
variable, an exercise variable, a characteristic 
variable, and then finally a test variable. This 
sequence indicates that all three types of 
variables contribute heavily to the predictive 
success of the program. 

In addition, it will be noted that Ascen- 
dancy, Vigor, and the Manufacturing Rating 
are activity measures. Administrative Ability 
and Interpersonal Contact have low correla- 
tions with the criterion but fairly high cor- 
relations with one or more of the activity 
measures, Additionally, the beta weights for 
Administrative Ability and Interpersonal Con- 
tact are negative, indicating that they are 
operating as suppressor variables. Further- 
more, if that portion of the In-Basket rating 
attributable to Administrative Ability is sub- 
tracted, the activity dimension of the inter- 
view accounts for the remaining variance. 

Bray and Grant (1966) present predictive 
correlations (against salary progress) of about 
the same magnitude for similar characteristics 
and exercises. However, their ability mea- 
sures (SCAT and others) yielded larger cor- 
relations, and their personality measures 
(EPPS and Guilford and Martin) yielded 
smaller correlations. It is difficult to deter- 
mine why these differences exist. One possible 
explanation for the greater predictiveness of 
the ability measures in the AT&T study is 
that the group might have been more hetero- 
geneous. The smaller correlations for the 
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personality measures in the AT&T study may 
be attributed to the different tests used or to 
differences in the organization involved. 


DISCUSSION 


It is evident from the above results that 
the subjectively derived OAR utilized in 
this assessment program is a valid pre- 
dictor of management success. The corre- 
lation of .37 (significant at .01 level) between 
the OAR and change in the position-level 
criterion substantiates this conclusion. How- 
ever, the data indicates that much higher 
validities may be achieved by an empirical 
combination of the scores and ratings ob- 


TABLE 3 


VARIABLES CONTRIBUTING TO MULTIPLE Rs 
IN STEPWISE ORDER 














Variable R 
Tests 
Ascendancy 39 
Vigor AS 
Exercises 
In-Basket 32 
Manufacturing 39 
Characteristics 
Self-Confidence EOD, 
Written Communications 38 
Administrative Ability AL 
Tests & exercises 
Ascendancy 39 
In-Basket 46 
Vigor 51 
BCD Success Key 52 
LOQ Consideration 54 
Characteristics & exercises 
Self-Confidence 32 
In-Basket 38 
Administrative Ability 44 
Manufacturing 49 
Interpersonal Contact oo 
Characteristics & tests 
Ascendancy 39 
Written Communications 46 
Vigor 00 
Administrative Ability ae 
Self-Confidence roo 
Tests, characteristics, & exercises 
Ascendancy 39 
In-Basket 46 
Administrative Ability 51 
Vigor 56 
Manufacturing Rating 59 
Interpersonal Contact .60 
BCD Success Key .62 
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tained in the program. Table 2 indicates that 
multiple Rs—based on tests alone, character- 
istics alone, or exercises alone—give higher 
validities than their subjective combination. 

Additionally, if all three types of measure- 
ment are combined in a multiple R, the re- 
sulting correlation is further substantially in- 
creased. This indicates that all three measures 
contribute materially to the validity of the 
overall program. Table 3 shows the variables 
that contribute significantly when all three 
types of measurement are combined. As in- 
dicated in the results section, all three types 
of variables (tests, exercises, and character- 
istics) are equally represented in this combi- 
nation multiple. 

An important question frequently raised 
(Bray & Grant, 1966) can also be partially 
answered from these results. That is, can the 
assessment procedure be justified in light of 
its additional cost and time compared to the 
use of paper-and-pencil tests alone? The 
multiple R for tests alone (Table 2) is .45 
with an R? of 20. The multiple R for tests, 
characteristics, and exercises combined is .62 
with an R? of .38. Inclusion of the elements 
unique to the assessment-center procedure, 
therefore, nearly doubles the criterion vari- 
ance accounted for. This indicates that the 
assessment procedure makes a_ substantial 
unique contribution to the prediction of man- 
agement success. 

Bray and Grant (1966) arrived at a simi- 
lar conclusion by a more indirect method. 
They partialled out mental ability (SCAT) 
from judged ability and found that reliable 
variance still remained. They concluded that 
“the results thus indicate that the assessment 
process does contribute more than can be 
gained by the simple administration of 
paper-and-pencil ability measures.” The re- 
sults from this study certainly support and 
confirm this conclusion. 

The results obtained may also be valuable 
in the future development of a shorter as- 
sessment program with maximum predictive 
results. When all of the variables are com- 
bined in a multiple R, only 7 of the 32 varia- 
bles studied contributed significantly. Based 
on these results, it may be possible to consider 
eliminating the paper-and-pencil tests not 
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contributing to the predictive validity and 
substituting other tests. However, eliminating 
any of the exercises may not give enough 
opportunity to establish usable characteristic 
ratings, which are presently based on all the 
exercises. For example, utilization of only 
three of the six exercises may significantly 
reduce the reliability and validity of our 
characteristic ratings. More research is thus 
needed to determine which exercises could best 
be eliminated without an adverse effect on 
the characteristic ratings. Some preliminary 
work has already been done on the substitu- 
tion of another exercise for the Task Force 
Exercise. 

Another interesting possibility for improve- 
ment of the program is indicated by the re- 
sults. The subjectively derived combination 
of the variables (OAR) correlated .37 with 
the criterion, while the statistical combination 
gave a multiple of .62. This suggests that 
instead of deriving an OAR by subjective 
mean, it might be done more profitably by a 
statistical procedure. This should greatly in- 
crease the predictiveness of the program. In 
order to accomplish this, however, further 
research and cross-validation of these results 
is needed. Meanwhile, improvement is possi- 
ble by emphasizing in the subjective determi- 
nation of the OAR those variables that show 
the highest correlation with the criterion. 
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VOCATIONAL INTERESTS OF WOMEN: 
A FACTOR ANALYSIS OF THE WOMEN’S FORM OF THE SVIB1? 


KIRK E. FARNSWORTH 2 


Iowa State University 


This study presents an attempt at better understanding of the women’s form of 
the Strong Vocational Interest Blank (SVIB). The responses of 671 women- 
in-general to each of the 400 items on Form T400R were intercorrelated and 
factor analyzed using the multiple group method and a variation of Lawley’s 
maximum likelihood procedure. Representative items for each of the resulting 
9 common and 17 more specific factors are noted, and descriptive labels are 
suggested. The procedure and resulting information are seen as distinctly 


different from previous analyses. 


It has been estimated that by 1970 one 
out of every three workers will be a woman 
(Miller, 1964); by 1980 75% of all college- 
educated women will be at work (Keyserling, 
1965). Accordingly, Arns (1958), Miller 
(1964), and others have emphasized the 
crucial importance of adequate vocational and 
educational guidance for females. Since the 
measurement of interests comprises an im- 
portant segment of such guidance, what is 
sorely needed is more precise knowledge about 
the psychological tests that are used to mea- 
sure the vocational interests of women. 

Construction of women’s vocational interest 
scales is difficult. A main cause is the inabil- 
ity to obtain homogeneous criterion groups, 
which results in difficulty in measuring hetero- 
geneous interest patterns among women 
(Super & Crites, 1962). Hogg (1928) ex- 
plained the measurement problems on the 
ground that women work not for the love of 
work but just to be busy, or that they tend 
to choose the occupations which offer the 
least resistance. Strang (1941) also concluded 
that women enter certain occupations for 
reasons other than genuine interest in the 
job. Strong (1943) specified the reasons as 
convenience and “stopgap” until marriage— 
referred to by others as premarital, noncareer, 


1 Completed in partial fulfillment of the require- 
ments for the doctoral degree under the overall 
supervision of Edwin C. Lewis and statistical super- 
vision of Leroy Wolins. 

2Requests for reprints should be sent to the 
author who is now at the Counseling and Testing 
Center, Schofield Hall, University of New Hamp- 
shire, Durham, New Hampshire 03824. 


filler, something-to-fall-back-on-in-case, and 
occupational insurance. 

According to Super (1945), women’s in- 
terests are more nearly universal (nonspecific) 
than men’s. Similarly, Darley and Hagenah 
(1955, p. 70) claim that “women’s interests 
are generally less channelized or less profes- 
sionally intense than are men’s.” Tyler 
(1956) hypothesizes a general attitude of 
wanting a career merely for the pursuit of 
any pleasant, congenial activity which offers 
itself until marriage, and McArthur (1958) 
suggests a “being”—rather than a “doing”— 
orientation. Warren (1959) summarizes other 
reasons: Women appear to have less clearly 
defined interests than do men, women are less 
intrinsically motivated, and women express 
their interests through more than one role. 
Harmon (1967b) concludes that women have 
a basic lack of vocational interest, but hastens 
to add that the pendulum is swinging away 
from work being described as a stopgap, to- 
ward work as an ongoing way of life (Har- 
mon & Campbell, 1968). 

In spite of the problems, the women’s form 
of the SVIB continues to be used to clarify 
women’s vocational interests. Perhaps the 
most efficient way to understand what in- 
terests are being measured is through factor 
analysis. Several factor analyses of the wom- 
en’s SVIB have been reported in the litera- 
ture (Anderson, 1965; Crissy & Daniel, 
1939; Darley, 1941; Strong, 1943). One of 
the common characteristics of these studies, 
however, is that they all made use of occupa- 
tional scale scores as the variables to be 
factor analyzed. Such a procedure, among 
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others, has been strongly criticized by Guil- 
ford (1952) as being inappropriate. 

The most probable reason for using scale 
scores, rather than the responses to individual 
items (as recommended by Guilford), was 
simply the lack of technical and computa- 
tional facilities to handle such a large number 
of variables. Such self-limiting boundaries 
also restricted the one study located by the 
author that did utilize items rather than scales 
in factor analyzing the women’s SVIB 
(Gernes, 1940). 

Gernes’ Ss were 500 Nebraska Teachers 
College freshman women. She selected 163 of 
the 410 items on the women’s SVIB and 
clustered them according to the item-group- 
ings on the inventory. Factor analysis of the 
8 resulting groups yielded a total of 35 un- 
labeled factors. Although it would perhaps 
have been easier and even less time-consum- 
ing rationally to determine the subgroupings 
of 163 items clustered into 8 groups, the value 
of the Gernes study lies in the way in which 
she approached the problem, operating as 
effectively as possible within the limits of 
highly restrictive computational techniques. 
Her research, therefore, provided an impetus 
for the present study. 


Mrrnuop 


The responses of 671 women-in-general to each of 
the 400 items on Form T400R of the SVIB were 
intercorrelated and the items grouped into 26 ra- 
tionally coherent clusters, on the basis of the magni- 
tude of their intercorrelations.2 The multiple group 
method of factor analysis (Thurstone, 1947) was 
followed, and 1 factor was extracted from each of 
the clusters. The angular cosines among the 26 
oblique factors were then obtained and in turn fac- 
tor analyzed, using a variation of Lawley’s maximum 
likelihood procedure (Jéreskog, 1967). The resulting 
9 common factors and the remaining 17 more spe- 
cific factors were then rotated orthogonally. Finally, 
the factor loadings on the 26 factors of each of the 
400 items were computed, and a 400 X 400 residual 
table was obtained. 


RESULTS 


Most of the 9 group factors (lettered A 
through I) were defined by items many of 
which also loaded on 1 or more of the 17 


8 The author wishes to express his appreciation to 
David P. Campbell of the Center for Interest Mea- 
surement Research, University of Minnesota, for 
making these data available. 


Kirk E. FARNSwortH 


subgeneral factors (numbered 1-17). The 
following group factors were defined by items 
at least half of which also had substantial 
loadings on the indicated subgeneral factors: 
on A—6,7,10,11,13, and 15; on B—3; on C— 
6 and 10; on D—8 and 17; on E—3,4,5, and 
14; on F—4,7,16, and 17; on G—3, 14, and 
15; on H—none; on I—7. 

Of the 400 items in the analysis, only 63 
failed to load at least as high as *+.20 on any 
of the 26 factors.* 


Subgeneral Factor A 


Of the 116 items with loadings of at least 
+20, those with the highest positive load- 
ings indicate a broad interest in designing 
costumes (.89) and children’s clothes (.80), 
interior decorating (.65), fashion modeling 
(.53), illustrating (.52), managing a wom- 
en’s style shop (.52), decorating a room with 
flowers (.50), teaching art (.49), dressmak- 
ing (.47), displaying merchandise in a store 
(.47), being a beauty specialist (.46), buying 
merchandise (.46), being a florist (.46), and 
experimenting with new beauty preparations 
(.45). Middle-range positive loadings indicate 
an interest in such things as being an actress 
(.30), dramatist (.37), music composer 
(.29), opera singer (.29), professional dancer 
(.40), or radio-TV singer (.32), acting as a 
cheerleader (.30), and entertaining others 
(.31). These loadings, along with those in the 
lower range, seem to indicate a general aes- 
thetic interest and would seem to be con- 
sistent with a summary label such as ‘“Artis- 
tic Endeavors and Performing Arts.” 


Subgeneral Factor B 


Sixty-five items have loadings of at least 
-.20. Interest in being a surgeon (.86), 
physician (.78), bacteriologist (.54), or bi- 
ologist (.53), watching an open-heart opera- 
tion (.53), being a nurse (.49), taping a 
bruised ankle (.48), and performing scientific 
experiments (.47), as well as being a member 
of any profession varying from criminal 
lawyer (.31) to professional golfer (.20), 


4 Due to the pagination involved, only those items 
which were responsible for naming each factor will 
be listed and discussed. Factor loading tables for 
each of the 26 factors, as well as the factor loading 
matrix, are available from the author. 
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makes the label ‘Medical and Professional 
Endeavors” seem logical. 


Subgeneral Factor C 


Fifty-seven items help to describe this 
factor. The highest positive loadings indicate 
an interest in being a musician (.80), music 
composer (.74), or opera singer (.56), and 
playing the piano (.51). Lower positive load- 
ings reveal an interest in art (.28, .21, .23, 
Ce nedaye.26, 92, 39, 26), drama (,30,\239, 
31, .29), poetry (.43, .35), and writing (.20, 
.24, .23, .30, .38, .20, .20). Clearly, the name 
“Music and Other Arts” would be appropri- 
ate. 


Subgeneral Factor D 


Sixty-three items are responsible for defin- 
ing Factor D, and support interests in being 
a criminal lawyer (.70), corporation lawyer 
(.64), or judge (.63), being the governor of 
a state (.51) or a politician in general (.45), 
and being an employment manager (.32), 
life insurance saleswoman (.29), or hotel 
manager (.25). Therefore, the suggested label 
is “Law, Leadership, and Business.” 


Subgeneral Factor E 


Factor E is defined by 80 items. Those 
with the highest positive loadings deal with 
being a statistician (.76) or bookkeeper (.75), 
studying statistics (.70), being an income tax 
accountant (.65) or computer operator (.61), 
making statistical charts (.61), and operat- 
ing office machines (.58), among other quanti- 
tative, technical-type endeavors. On this 
basis, it is suggested that the factor be de- 
scribed as ‘“Quantitative-Technical Clerical.” 


Subgeneral Factor F 


Factor F is represented by 104 items. 
“White-Collar Work” seems to be the implied 
interest, inferred from such items as being a 
nurse’s aid (.63), nurse (.50), dental as- 
sistant (.49), office clerk (.49), receptionist 
(.49), typist (.46), hospital records clerk 
(.45), and stenographer (.45). 


Subgeneral Factor G 


A scientific factor is indicated by the 43 
items of Factor G. Such items as botany 
(.63), nature study (.55), zoology (.55), ge- 
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ology (.50), and bird watching (.46), as 
well as those with relatively low positive 
factor loadings, suggest that this factor would 
be most accurately called “Natural Science.” 


Subgeneral Factor H 


Thirty-four items load at least +.20 on 
Factor H. Interest in physical education 
(.69), being an athletic director (.68), being 
a physical education director (.60), being a 
tennis champion (.53), and engaging in 
physical activity (.51) indicates that the 
factor should be named, simply, “Athletic.” 


Subgeneral Factor I 


The 49 items comprising Factor I have, on 
the average, the lowest positive factor load- 
ings of the nine subgeneral factors. The factor 
is sufficiently defined, however, to permit 
easy interpretation. Items pertaining to being 
an employment manager (.60), hotel mana- 
ger (.48), or office manager (.46), plus other 
very similar items, are consistent with label- 
ing the factor “Managerial.” 


Group Factor 1 


Twenty-one items define Factor 1. Teach- 
ing kindergarten (.66), grade school (.55), 
and children in general (.51), and managing 
a children’s nursery at a resort hotel (.45) 
strongly indicate interests in “Children and 
Teaching.” 


Group Factor 2 


“Social Science” best describes Factor 2, 
summarizing the 9 relevant items; psycholo- 
gist (.67), psychology (.61), and sociology 
(.35) are the items with the highest positive 
loadings. 


Group Factor 3 


Fifteen items define Factor 3. Being a sci- 
entific research worker (.57) or chemist (.50), 
performing scientific experiments (.49), and 
doing research work (.42) suggest the label 
“Research.” 


Group Factor 4 


The 17 items which define Factor 4 are 
best summarized by “Secretarial”: shorthand 
(.58), stenographer (.55), typist (.54), and 
private secretary (.50). 
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Group Factor 5 


A “Quantitative” interest is indicated by 
the 6 items of Factor 5: algebra (.68), geo- 
metry (.55), arithmetic (.45), calculus (.37), 
physics (.28), and chemistry (.24). 


Group Factor 6 


Sixteen items are responsible for describing 
Factor 6; interest in being an artist (.58), 
liking prominent artists (.46), and teaching 
art (.45) suggest the title “Artistic.” 


Group Factor 7 


Twenty-five items describe Factor 7. The 
items are bunched together in a relatively low 
range of factor loadings. Nevertheless, it 
seems most logical to interpret this factor as 
indicating interest in ‘Sales and Business.” 
Retailer (.38), specialty saleswoman (.34), 
and waitress (.22) are representative items. 


Group Factor 8 


Factor 8 is represented by 14 items. Those 
dealing with politicians (.36, .63, .37, .27, .29, 
.30) and electioneering for office (.49), as 
well as several dealing with various outlets 
for speaking, define this factor as “Politics 
and Performing Arts.” 


Group Factor 9 


Twelve items describe Factor 9 as having 
to do with civics (.69), political science (.57), 
and history (.47), in addition to other areas 
of the ‘““Humanities.” 


Group Factor 10 


Dramatist (.61), famous actress (.51), ac- 
tress (.48), and dramatics (.47) are the most 
prominent of the 17 items that describe Fac- 
tor 10 as a “Dramatics” factor. 


Group Factor 11 


Nineteen items load at least +.20 on Fac- 
tor 11. The items that are most responsible 
for naming the factor “Literary” are author 
of best-selling novel (.59), magazine writer 
(.53), and author of novel (.50). 


Group Factor 12 


Items dealing with cooking (.49, .89), try- 
ing new recipes (.68), and preparing dinner 
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for guests (.33, .41, .56, .39) have the highest 
loadings of the 16 items that make it seem 
logical to interpret Factor 12 as being an 
indicator of ‘‘Culinary’’ interests. 


Group Factor 13 


All 4 items defining Factor 13 obviously 
represent a ‘‘Sewing” interest: preparing a 
meal versus making a dress (—.62), sewing 
(.56), dressmaker (.39), and home economics 
(.29). 


Group Factor 14 


The 14 items representing Factor 14 are 
concerned with repairing electrical wiring 
(.55), operating machinery (.51), inventing 
(.45), and tinkering (.45); they define the 
factor as “Mechanical.” 


Group Factor 15 


An “Agrarian” interest is indicated for 
Factor 15 by those items, out of 10, with the 
highest loadings: planning the landscape 
(.48), landscape gardner (.43), and raising 
flowers and vegetables (.43). 


Group Factor 16 


Factor 16 clearly indicates a “Religious” 
interest, since practically all of the 15 items 
are directly related to spiritual matters: read- 
ing the Bible (.66), going to church (.62), 
religious people (.59), church worker (.57), 
and so forth. 


Group Factor 17 


Ten items are responsible for describing 
Factor 17. Since policewoman (.69) has the 
only relatively high loading, the factor would 
seem to warrant the name “Law Enforce- 
ment.” 


Residuals 


Only three residuals had factor loadings 
greater than .35 (.38, .38, .43), and each 
comprised a couplet (correlated highly with 
only one other item). There were no residual 
factors. 


Measurement Scales 


“Homogeneous” factor-type scales were 
constructed by identifying those items which 
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can be considered to be relatively pure mea- 
sures of each of the 26 factors. The items 
within each homogeneous scale have a high 
relationship to each other and a low relation- 
ship to the other items in the inventory.® 


DISCUSSION 


The vocational interests of women are in- 
deed complex, not to be explained away by 
the reductionistic dichotomy of “career ver- 
sus homemaking.” In other words, the high 
specificity of structure—many identifiable 
factors—strongly suggests that women’s vo- 
cational interests are far more complex than 
has been widely assumed. 

The results of any analysis, however, are 
highly dependent upon the restrictions of the 
data to be analyzed. One restriction is that 
the dimensions of women’s vocational in- 
terests are partially a function of the concep- 
tion of those interests by those who construct 
the instrument to be used, in this case, the 
SVIB. For example, women were conceived 
to be interested in religion; several religion 
items were therefore included, which conse- 
quently yielded a “Religious” factor. Such 
was not the case with the men’s form, how- 
ever. (For comparison, the reader is referred 
to the Cranny [1967] factor analysis of the 
men’s form which in many respects was a 
companion study to the present analysis.) In 
addition, the complexity of factors which re- 
sulted from the present analysis was based, 
to a large extent, on a sample composed of 
professional women. 

The 26 factors isolated in the present study 
provide for a deeper understanding of the 
theoretical constructs underlying women’s vo- 
cational interests, both in the complexity of 
those constructs and in the content of the 
specific factors. While a better understanding 
of the origin and development~of women’s 
interests has not been gained, it is meaning- 
ful that so many of the 26 factors are not 
directly vocationally oriented. Rather, many 
of the interests seem to be concerned with 
such things as homemaking and the feminine 
role. Perhaps this is a verification of the diffi- 
culty in measuring women’s interests in gen- 
eral. Certainly, the present results reflect an 


5 Available upon request from the author. 
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uncertainty of the role of interests in voca- 
tional choice, which is particularly true for 
women of college age, who do not tend to 
think in terms of long-term careers (Harmon, 
1967a; Super, 1957). 

Dunteman (1966) has commented on the 
communalities between two of the factor 
analyses (Anderson, 1965; Crissy & Daniel, 
1939) which utilized scale scores rather than 
responses to individual items. The present 
study, however, stands alone and is not di- 
rectly comparable with such studies. It is 
anticipated that the complex computer tech- 
nology now available will be applied in simi- 
lar ways in the future. 


Research Implications 


Clark (1960) has said that “homogeneous” 
measurement scales are the logical end result 
of a search for meaning such as the search 
made by the present study for more basic 
knowledge of the vocational interests of 
women. Having identified such scales, the 
question now arises as to their use. One ap- 
plication of the measurement scales would be 
to construct a short form of the women’s form 
of the SVIB. 

The possibility of the factor structure 
changing over time now needs to be recog- 
nized and should be investigated. This, along 
with an investigation of the predictive valid- 
ity of the measurement scales and compari- 
sons with replications of the present study and 
similar analyses of other interest measures, 
would seem to be possible as well as relevant. 
Overall, the results of the present study should 
facilitate future efforts toward the develop- 
ment of a sound theoretical structure of wom- 
en’s vocational interests. 
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OCCUPATIONAL GROUP AS A MODERATOR OF THE JOB 
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San Diego, California 


Personnel comprising wintering-over parties at small scientific stations in 
Antarctica represent two broad but quite different occupational groups: ci- 
vilian scientists and Navy enlisted men. The motivations of the Navy enlisted 
men who volunteer are less related to their specific jobs in the Antarctic than 
are those of the civilian scientists. The results confirmed the hypothesis that 
occupational group is a moderator of the job satisfaction—job performance 
relationship, and that the relationship is more pronounced for the scientist 


group than for the Navy enlisted group. 


One of the more unequivocal conclusions 
that has been drawn from past research on 
employee attitudes and job performance is 
that there is not a simple relationship be- 
tween the two (Brayfield & Crockett, 1955; 
Kahn, 1960; Katzell, Barrett, & Parker, 
1961) and that not much is likely to be 
learned from simple two-variable designs. 
Katzell et al. (1961) suggest that job satis- 
faction and performance should be considered 
as separate outputs of the work situation 
which, depending upon the intervening varia- 
bles of employee needs and expectations, may 
or may not be correlated. 

Differences in employee needs and expecta- 
tions seemed to be reflected in occupational 
levels. Centers and Bugental (1966) demon- 
strated that, at higher occupational levels, 
intrinsic job components (opportunity for 
self-expression, interest-value work, etc.) were 
more valued while, at lower occupational 
levels, extrinsic job components (pay, secur- 
ity, co-workers, etc.) were more highly re- 
garded. 


1 Report Number 69-11, supported by the Bureau 
of Medicine and Surgery, Navy Department, under 
Research Work Unit MF12.524.003-9005. Opinions 
expressed are those of the authors and are not to 
be construed as necessarily reflecting the official view 
or endorsement of the Department of the Navy. The 
assistance of Mr. David H. Ryman is gratefully 
acknowledged. 

2 Requests for reprints should be sent to Richard 
E. Doll, Department of the Navy, Navy Medical 
Neuropsychiatric Research Unit, San Diego, Cali- 
fornia 95152. 


Tannenbaum (1966) notes that one of the 
weaknesses in the hypothesis that associates 
productivity with satisfaction is the failure to 
distinguish between satisfaction and motiva- 
tion. He points out that a person may be 
satisfied with his work insofar as his needs are 
met but his satisfaction indicates little about 
his motivation to work, particularly when his 
satisfaction does not depend on the amount of 
effort he puts into his work. 

Personnel comprising wintering-over parties 
at small scientific stations in Antarctica rep- 
resent two broad but quite different occupa- 
tional groups. One group consists of civilian 
scientists whose sole raison d’etre revolves 
around their individual scientific project. The 
other broad occupational group is composed 
of United States Navy enlisted men whose 
reasons for participating tend to be more di- 
verse and less specific. Questionnaire data 
have shown that these men offer such reasons 
for volunteering as saving money, adventure, 
promotional opportunities, experience, etc. In 
other words, the motivations of the Navy en- 
listed men would seem to be less related to 
their specific jobs in the Antarctic (e.g., con- 
struction, mechanics, electronics) than those 
of civilian scientists. 

The purpose of the present study was to 
investigate the hypothesis that occupational 
group, as defined herein, is a moderator of the 
job satisfaction—job performance relationship 
and that the relationship will be more pro- 
nounced for the scientist group than for the 
Navy enlisted group. 
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MertTHOD 
Subjects 


The sample consisted of 129 Navy enlisted men 
and 66 civilian scientists who were assigned to 
United States Antarctic stations for one year. All Ss 
were volunteers who had been selected for Antarc- 
tic duty primarily on the basis of occupational 
competence. The Navy enlisted group was largely 
composed of Navy construction personnel (“Sea- 
bees”) and Navy technical and administrative per- 
sonnel. 

The mean ages and educational levels for the two 
occupational groups were as follows: Navy enlisted 
—mean age 27.5 yr. and mean education 11.3 yr.; 
Scientists—mean age 25.7 yr. and education level 
at least a BA. 


Procedure 


The Ss filled out biographical, personality, and 
attitude inventories as part of the psychiatric assess- 
ment program for Antarctic volunteers. Along with 
specially constructed attitude inventories, perform- 
ance measures consisting of supervisor ratings and 
peer nominations were obtained on two occasions 
during the year of duty in Antarctica, early in the 
period of winter isolation, and again near the end of 
winter isolation approximately 6 mo. later. 

The data for the present study were based on 
two self-report scales reflecting job satisfaction and 
three criterion scores of job performance as derived 
from supervisor ratings, peer nominations, and a 
combination of the two. The job satisfaction scores 
and the job performance scores were obtained near 
the end of the winter isolation. 

The job satisfaction scales were originally devel- 
oped from a series of factor analyses of a set of 
attitude items administered to combined samples of 
Ss. These analyses produced two factors which 
seemed to fall under the rubric of job satisfaction. 
The finding of two factors is congruent with the 
conclusion of Hulin and Smith (1965) who point 
out that job satisfaction is not a unidimensional 
variable but is made up of a number of factors or 
distinct areas of satisfaction. 


TABLE 1 


CoRRELATIONS BETWEEN JOB PREFORMANCE 
RATINGS AND JOB SATISFACTION SCALES 





Navy 
(NV = 129) 


Combined 
(NV = 195) 


Scientist 
: (NV = 66) 
Ratings 





Supervisor | .05 | .03 


At. || .3O** AN ly, 
Peer .03 | .04 | .27* | .47** | —.09 | .14* 
Combined | .04 | .01 | .33** | .44** .03 | .16* 


Note.—A = Job Morale Scale; B = Job Importance Scale, 
*p <.05. 
> <.01. 
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As differences between the Navy and civilian oc- 
cupational groups became apparent, intercorrelation 
matrices of the scale items were examined for each 
group separately. The matrices were found to be 
very similar and only minor changes in scale con- 
tent were required to make the job satisfaction scales 
comparable for the two groups. 

The first scale consisted of five items reflecting a 
rather general expression of satisfaction or dissatis- 
faction with the wintering-over experience and was 
labeled Job Morale. This scale consisted of the fol- 
lowing items: (a) I would like to go on another 
Antarctic expedition in the future. (b) I am happier 
with my job here than I was in my last assignment. 
(c) Time passes too slowly in the Antarctic (re- 
sponse value reversed). (d@) I wish I could stay in 
the Antarctic longer than now planned. (e) I often 
wish I had never come to the Antarctic (response 
value reversed). 

The second scale consisted of four items and was 
labeled Job Importance. The Job Importance Scale 
was composed of the following items: (a) My pres- 
ent duties employ my abilities in the best possible 
way. (b) The success or failure of this station de- 
pends as much on my job as any other. (c) My job 
here is important enough to justify my spending all 
this time in the Antarctic. (d) I feel that I am 
contributing as much on this expedition as others 
are. 

A 6-point scale, anchored at the ends by “strongly 
disagree” and “strongly agree” was used for each 
item with a range from 1 to 6, with reversals where 
appropriate. The scale scores were the algebraic 
sums of the item weights. 

The job performance criteria were based on inde- 
pendent ratings by two station leaders (supervisors) 
and nominations by all peers, both military and 
civilian. Each man assigned to the station was rated 
by the supervisors for the characteristics of “in- 
dustriousness,” “motivation,” and “proficiency” ac- 
cording to an 8-point rating scale format. The peer 
nominations consisted of each man nominating the 
five most outstanding men on “industriousness.” 
Criterion scores were converted to T scores (M = 
50; SD =10) without regard for occupational group- 
ings. The third job performance criterion score was 
obtained by combining the supervisor and peer cri- 
terion scores. 

Pearson correlations were computed between the 
two job satisfaction scores and the three criterion 
scores for the Navy enlisted group and the civilian 
scientist group separately and combined. Means and 
standard deviations for the various comparisons also 
were computed. 


RESULTS AND DISCUSSION 


The results presented in Table 1 confirm 
the hypothesis that measures of job satisfac- 
tion and job performance would be more 
closely related within the civilian scientist 
group than within the Navy enlisted group. 


Jos SATISFACTION AND JOB PERFORMANCE 


None of the job satisfaction—-job performance 
correlations within the Navy enlisted group 
approached significance (7 > .14; p< .05), 
while within the civilian scientist group all 
correlations were significant. 

By not separating the two groups, one 
would conclude from the correlations shown 
under the combined group that there was 
little, if any, relationship between job per- 
formance and job morale, and a barely sig- 
nificant relationship between job importance 
and job performance. The ambiguity in in- 
terpretation resulting from combining two 
such widely different occupational groups is 
exemplified by the negative correlation ob- 
tained between the Job Morale Scale and the 
job performance peer ratings despite the fact 
that a positive correlation existed in both the 
Navy enlisted and scientist group compari- 
sons. This paradox can be explained by exam- 
ining the means of the peer ratings and the 
Job Morale Scale shown in Table 2; the Navy 
enlisted group had a higher peer rating score 
than the scientist group but a lower Job 
Morale score. When the two groups were com- 
bined, a negative relationship appeared. The 
standard deviations shown in Table 2 strongly 
suggest that the differences in correlations 
given in Table 1 are not simply restriction of 
range artifacts. 

In addition to the above analysis, correla- 
tions were computed between the two job sat- 
isfaction scales for the two occupational 
groups separately. The correlation was .19 
for the Navy enlisted group and .38 for the 
scientist group. This is a rather substantial 
difference and would indicate that occupa- 
tional groups, as used in this study, might 
not only be a moderator of the job satisfac- 
tion—job performance relationships, but may 
also be a moderator of the job importance— 
overall job satisfaction relationship. 

The results of the present study support the 
notion that the job satisfaction—job perform- 
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TABLE 2 


MEANS AND STANDARD DEVIATIONS OF JOB PER- 
FORMANCE SCORES AND JOB SATISFACTION SCALES 














Navy Scientist Combined 
Variable 

M DD M SD M | SD 

Supervisor | 50.5 | 8.9 | 49.7 | 10.4] 49.9 | 9.4 
Peer 92.8 | 8.9 | 46.1 | 7.0] 50.0 | 9.0 
Combined 51.3 | 8.0 | 47.8 | 8.0} 49.8 | 8.3 
Morale 17.6 | 5.0 | 23.4 | 5.0} 19.4 | 5.6 
Importance | 17.2 | 4.6 | 16.0 | 4.6] 16.6 | 4.8 


ance relationship is not a simple bivariate 
function. It was found that job performance 
ratings, both supervisor and peer, are sub- 
stantially related to measure of job satisfac- 
tion for a group whose motives for being in 
Antarctica were primarily oriented toward 
specific tasks and goals but do not relate at 
all within a group whose motivations for vol- 
unteering are more diverse and less highly 
related to the job itself. 
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A research program involving over 20,000 door-to-door salesmen resulted in 
the development of an objectively scored mail questionnaire which effectively 
eliminated three-fourths of those who failed to meet minimum standards of 
sales performance. The content of the questionnaire and the scoring key 
clearly indicated that the personal characteristics which made for sales success 
were contrary to what the company had previously believed. 


The company which is the subject of the 
article is a direct seller, that is, a company 
which distributes its merchandise by means 
of door-to-door salesmen. The National Asso- 
ciation of Direct Selling Companies, the trade 
association for the industry, reports that there 
are about 1,200 such companies doing busi- 
ness in the United States; and although the 
total volume of business done by these com- 
panies is not precisely known, the total vol- 
ume of the 200 members of the association is 
approximately five billion dollars annually. 

Like many direct sellers, this particular 
company recruits its sales force almost en- 
tirely by mail. Every week about 2,000 teen- 
age boys apply for jobs as salesmen in re- 
sponse to direct mail solicitation and to news- 
paper and magazine advertisements. Appli- 
cants are then mailed sales materials following 
examination of their application blanks and a 
delinquency check of the company records. 
For many years the application blank con- 
sisted only of the applicant’s name and ad- 
dress, age, sex, and a parental endorsement; 
and virtually all of the applicants who applied 
were accepted. Although this company has 
long been established and operates a very 
profitable business, under this system a large 
majority of the boys who were started each 
week failed to perform to the company’s mini- 
mum standard of sales performance. A sales 
failure was defined as someone who did not 
return a sales volume more than sufficient to 
recover the cost of recruiting and the mailing 
of materials. Generally this was a person who 








1 Requests for reprints should be sent to Valen- 
tine Appel, Grudin/Appel Research Corporation, 105 
Madison Avenue, New York, New York 10016. 
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voluntarily discontinued selling in the first 
month. 

The situation presented an unusual research 
opportunity for improving the efficiency of 
the company’s selection procedures. To add to 
the gravity of the problem, there are very few 
organizations—outside of the armed forces— 
where such large samples of personnel are 
continuously recruited to perform exactly the 
same job. Moreover, data were economically 
available. Since the company policy is to 
recruit through the mails, mail questionnaires 
were mandatory, and there was no question 
of nonrespondent bias, which is a major prob- 
lem in most mail surveys. 


THE RESEARCH PROGRAM 


The research naturally divided itself into 
four basic parts: (a) a preliminary qualita- 
tive investigation, (b) the development of 
questionnaire items, (c) tests using nonreturn 
of the questionnaire as a screening device, 
and (d) revision and validation of the ques- 
tionnaire items. Although each phase of the 
program has been duplicated one or more 
times as more data were required, for sim- 
plicity of presentation only one study from 
each of the four phases will be described here. 


Qualitative Investigation 


The initial step was to conduct a series of 
informal personal interviews in various parts 
of the country among boys who had been 
previously classified as successful or unsuc- 
cessful. In addition, prior to interviewing the 
salesboys, personal interviews were conducted 
with members of the company’s sales depart- 
ment having responsibility for recruitment 
and selection. As a result of the investigation, 
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it became obvious that most of the company 
executives believed that the potentially suc- 
cessful salesboy was one who came from a 
lower income home in which money was not 
easy to come by, and where it was necessary 
for the children to earn their own spending 
money. The following quotations, each from 
a different interview, will illustrate: 


A boy whose family is... preferably in the 
lower income brackets. Generally speaking, boys 
from higher income families can get money much 
easier than by selling. 


A boy from a home with a number of brothers 
and sisters ... from a home where his parents 
cannot give him too much. If he wants money, he 
must find a way to get it himself. 


A boy from a broken home who needs to find a 
way of life ...a boy who wants additional 
money because of limited parental income. 


In contrast to these Horatio Alger expecta- 
tions, the interviewers who visited the boys 
in their homes reported that the successful 
boys appeared to be ordinary middle class 
teenagers. The unsuccessful boys, however, 
showed a marked contrast. They were diffi- 
cult to locate, because they did not appear to 
live in one place for any length of time; and 
the homes in which they had previously lived 
appeared typically to be run down. When the 
children were located, they were frequently 
found in households which had been deserted 
by one parent or the other, and in general 
their families were depressed economically. 


Questionnaire Development 


On the basis of the preliminary qualitative 
investigation, a series of eight multiple-choice 
questions was prepared for use in a mail 
questionnaire. These questions were specifi- 
cally written to resolve the severe contradic- 
tion which existed between the prevailing 
opinion in the sales department and the re- 
ports of the interviewers who contacted the 
salesboys in the field. 

Because of the sensitive nature of the data 
which were required, plus the fact that a mail 
questionnaire was to be employed, some dis- 
guise in questioning appeared to be in order. 
Accordingly, questions were written which 
were known or believed to be correlated with 
family cohesiveness and socioeconomic status, 
but which on the surface appeared not di- 
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TABLE 1 
SOME OF THE DISCRIMINATING QUESTIONNAIRE 
RESPONSES 
: : Successful | Unsuccessful 
Questionnaire responses . . 
applicants | applicants 
Has a bicycle 75% 52% 
Family receives a newspaper 12% §2% 
Attended show or circus with 
parents 64% 43% 
Has a telephone at home 63% 38% 
Signed application in pen 59% 46% 
Saves money earned 51% 38% 
Family less than 6 children 54% 30% 
Base: all returns* 195 365 





_ “Actually these data are from a later mailing—a sample 
independent of the one upon which the original analysis was 
conducted and which confirmed the original findings. 


rectly related. Such questions related to bi- 
cycle ownership, telephone subscription, par- 
ticipation with parents in recreational activi- 
ties, use of money earned, and size of family. 

The questionnaires were mailed to all the 
boys (approximately 2,000) who had been 
started as salesboys in a given week. By the 
close of tabulation exactly two-thirds of the 
questionnaires had been returned. The sample 
was then divided into random halves on an 
every other questionnaire basis. Using one of 
the random halves, the questionnaire re- 
sponses of the successful boys were compared 
with those who were less successful. 

This comparison, which is illustrated in 
Table 1, essentially confirmed the findings of 
the original qualitative study in which the 
boys were visited in their homes. Moreover, 
the data contradicted the prior opinions of 
the sales executives who believed that sales 
success came of economic need. Paradoxi- » 
cally, the reverse appeared to be true. Those 
who appeared least in need of money also 
appeared most likely to exert themselves to 
earn it. The successful boy, when compared 
with his unsuccessful counterpart, was more 
likely to own a bicycle (75% vs. 52%), par- 
ticipate in recreational activities with his par- 
ents, have a telephone at home, come from a 
family of less than six children, and so forth. 

On the basis of this comparison, a unit- 
weighted scoring key was developed in which 
those responses which were characteristic of 
successful boys were scored +1 and those 
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TABLE 2 


RELATIONSHIP BETWEEN QUESTIONNAIRE 
RESPONSE AND JOB SUCCESS 








Returned questionnaire 
and scored in 





: A Failed to 
Applicants webu 
Bottom | Middle Top 
third third third 
Successful 8% 22% 46% 59% 
Unsuccessful 92% 78% 54% 41% 
Base: all applicants 309 221 205 198 





which were characteristic of unsuccessful boys 
were scored —1. All other responses were 
scored zero. This key was then applied to 
the remaining half-sample which had not been 
examined to this point. The total scores on 
the questionnaire were then related to whether 
or not each boy met the company’s minimum 
standard of sales success. 

The results of this analysis are summarized 
in Table 2 which divides the second half-sam- 
ple into four groups: those who failed to 
return the questionnaire, and those who did 
return and who scored in the top, middle, and 
bottom thirds using the scoring key which 
had been independently developed using the 
first half-sample. By this device the per- 
centage of successful applicants was increased 
from 8% to 59%, depending upon whether or 
not the questionnaire was returned and how 
well it scored. 


Nonreturn as a Screening Device 


Our intention in developing the question- 
naire items was to produce material for in- 
clusion in the application forms which were 
to be part of the recruitment ads. It was never 
the intention to send a questionnaire to sales 
applicants as a second and separate mailing. 
The possibility, therefore, of using a second 
mailing and of screening out all those who 
failed to return the questionnaire, as implied 
by Table 2, was frankly unexpected. More- 
over, owing to the high cost of recruitment 
plus the limited availability of applicants, 
there was resistance by the sales department 
to the use of a second mailing because of the 
possibility that a large percentage of success- 
ful boys might be lost through their failure to 
return the questionnaire. In addition, there 
is a common belief among direct sellers that 
even a brief delay in placing sales material 
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in the hands of sales applicants reduces the 
probability of their being successful. Any 
delay is presumed to reduce the motivation 
of the applicant. 

There was the additional question of 
whether the failure of the unsuccessful appli- 
cants to return the questionnaire would be 
predictive of their later failure or whether 
the nonreturn was caused by their prior fail- 
ure. After all, the questionnaires were re- 
ceived by the applicants following receipt of 
merchandise to sell, and it was entirely likely 
that the decision not to return the question- 
naire was prompted by their prior sales fail- 
ure. 

In order to resolve these doubts a special 
test was conducted involving 483 sales appli- 
cants who were divided into two matched 
samples on an every-other-name basis. One 
sample—the screened sample—was mailed a 
questionnaire and then merchandise to sell 
only upon return of the completed question- 
naire. The unscreened sample was mailed 
merchandise immediately as per the usual 
procedure, and no questionnaire was sent at 
all. The results of this test are summarized in 
Table 3 from which two conclusions may be 
drawn: (a) The screening did not reduce the 
percentage of successful applicants (10% in 
the screened sample and 9% in the unscreened 
sample), and (6) the use of the questionnaire 
as a screening device had effectively elimi- 
nated half the unsuccessful applicants (45% 
out of 90%). 

Armed with this information, the company 
adopted the technique of using a separate 
mail questionnaire. The nonreturn of this 
questionnaire was then employed as a screen- 
ing device. 


TABLE 3 


NONRETURN OF THE QUESTIONNAIRE 
AS A SCREENING DEVICE 





F Screened Unscreened 
Type of applicant sample sample 
Successful 10% 9% 
Unsuccessful > 45% 91% 
Screened out®* 45% — 
Base: all applicants 241 242 





» Questionnaire not returned. 
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Questionnaire Revision 


To this point the scoring of the question- 
naire was not included as a step in the selec- 
tion procedure for the same reason that the 
company was initially reluctant to use the 
nonreturn of the questionnaire as a screening 
device. The limited supply of applicants and 
the need to maintain sales volume made it 
necessary to maintain a minimum sales staff, 
however inefficient that sales staff might be. 
Moreover, because the scoring key had been 
developed upon a sample of boys who may 
have already experienced some measure of 
success or failure prior to having received the 
questionnaire, there was a natural reluctance 
to adopt the scoring key under conditions 
where the questionnaire had to be returned 
before the merchandise could be sent. 

However, since use of the new question- 
naire had now been adopted as standard pro- 
cedure, it became a simple matter to conduct 
additional screening studies to add question- 
naire items, to delete others, and to improve 
the scoring key. Finally, as a result of a series 
of tests involving over 20,000 boys, a 7-item 
questionnaire, printed on a 44 X 8 business 
reply card, was developed to be mailed out 
on receipt of a sales application. On the basis 
of these tests the questionnaire was demon- 
strated to predict with sufficient accuracy as 
to be made operational which applicants 
would later be successful and which would 
not. 

The data from the most recent such tests 
are summarized in Table 4 which divides 560 
applicants who returned the questionnaire 
into two groups: those who were successful 
and those who were not. For each of these 
two groups the percentages of applicants 
scoring in top, middle, and bottom thirds is 
shown. From this, it can be seen that by 
rejecting all those who failed to score in at 
least the middle third, 44% of the potentially 
unsuccessful candidates who returned the 
questionnaire at the cost of 16% of the 
success could be eliminated. 

The estimated utility of the entire program, 
including the use of nonreturn of the ques- 
tionnaire as a screening device plus the addi- 
tional screening following scoring of the 
questionnaire, can be calculated from the 
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TABLE 4 


PERCENTAGES OF SUCCESSFUL AND UNSUCCESSFUL 
APPLICANTS REJECTED AT THREE CUTTING PoINTsS 





Gurnee pont Successful Unsuccessful 
P applicants applicants 
Top third 57% 27% 
Middle third 271% 29% 
Bottom third 16% 447, 
Base: all returns® 195 365 





® Individual item responses reported in Table 1. 


data presented in Tables 3 and 4. Table 
3 determined that nonreturn of the ques- 
tionnaire eliminated 50% of the unsuccessful 
applicants at a cost of virtually none of the 
successes. Table 4 further determined that 
if all questionnaire returnees who scored in 
the bottom third were rejected, 44% of the 
failures who were not screened out on the 
basis of questionnaire nonreturn would be 
eliminated. Since we estimate that 50% of the 
potentially unsuccessful applicants were al- 
ready screened out by their failure to return 
the questionnaire, it follows that 44% of un- 
successful returnees represents an additional 
22% of all unsuccessful applicants. From this 
comes the estimate that the program was able 
effectively to eliminate about three-fourths 
(50% plus 22%) of the sales failures at a 
cost of about one-fifth (16%) of the 
successes. 


DISCUSSION 


Although the use of biographical data in 
the selection of sales personnel has a history 
dating back nearly 50 yr. (Goldsmith, 1922; 
Harrell, 1960; Kornhauser, 1941; Kurtz, 
1941; Manson, 1925), the present study pro- 
vides an unusually clear example of how the 
use of an objectively scored questionnaire, 
economically administered through the mail, 
was able effectively to reduce sales and mar- 
keting costs. The study again points up the 
substantial discrepancy which sometimes ex- 
ists between management opinion and what 
the market facts actually are. A similar dis- 
crepancy was previously demonstrated in a 
study by Blum and Appel (1961) relating to 
package design. That study showed the com- 
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plete inability of industrial designers and 
marketing management to predict consumer 
tastes and perceptions. The present study 
demonstrates how seriously the discrepancy 
between what the facts actually were and 
what they were believed to be had seriously 
limited the effectiveness of the company’s 
sales and marketing program. 
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Data based on new college-level employees are reported which show the 
reliability and structure of the Work Components Study (WCS). A compari- 
son of those persons who remain with the company with those who leave at 
the company’s initiative and with those who leave of their own initiative 
shows few differences in their orientations toward work as measured by the 
WCS. However, when the company’s rating of how many years it will take 
the new hires to reach the third level of management was taken as a criterion, 
it was found that those hires who score highest on WCS Score 3, Competitive- 
ness desirability, highest on the SCAT verbal ability score, and highest on a 
personality measure of responsibility are those perceived by the company as 
moving ahead fastest. These results suggest that it is not the “organization 
man” type who is likely to be promoted, but the man perceived to be highly 


competitive, intelligent, and responsible. 


In a recent paper, a revised version of the 
Work Components Study (WCS) was de- 
scribed (Borgatta, Ford, & Bohrnstedt, 1968). 
Data were reported on the between- and 
within-cluster median intercorrelations for the 
scores, as well as reliability statistics for in- 
dependent replications with large samples of 
male and female freshman college students. 
Additionally, information was reported on the 
relationship of the WCS to personality in- 
ventory scores, college entrance test scores, 
and educational and income aspirations. In 
the current paper, use of the WCS with large 
groups of college-level personnel hired by a 
major industrial organization is reported. Data 
are presented on the reliability of WCS scores 
with these samples. However, the central con- 
cern of this paper is the exploration of the 
predictive power of the WCS to employment 
outcomes for a subsample of new college-level 
employees. 

On a voluntary basis, new college-level em- 
ployees of companies in the Bell Telephone 
System completed a questionnaire that in- 
cluded the WCS. The employees were given 
the questionnaire before actual arrival on the 
job (e.g., plant visit) or as soon after arrival 

1 Requests for reprints should be sent to Edgar F. 


Borgatta, Social Science Building, University of Wis- 
consin, Madison, Wisconsin 53706. 


as was convenient. Cooperation was en- 
couraged, but the emphasis on confidentiality 
and the voluntary basis of cooperation was 
stressed. Questionnaires were not anonymous, 
but the instructions noted that the individual 
data would in no way be made available to 
the cooperating companies and could not af- 
fect the career patterns of the respondent. 
Questionnaires were accompanied by addressed 
envelopes to be mailed directly to the re- 
searchers at the University of Wisconsin. It 
is estimated that a substantial majority of the 
new employees sent in questionnaires, but it 
was not possible to determine the actual pro- 
portion of cooperating employees because of 
differences in level of cooperation of the com- 
panies themselves. 

Usable data were available for 869 male 
and 344 female college-level personnel hired 
during 1964. Employees separated from the 
company were classified by whether the com- 
pany initiated the separation or the employee 
initiated it. Further, there were notations in 
the company’s record as to whether the prog- 
ress of the employee was satisfactory or un- 
satisfactory at the time of separation, but this 
distinction will not be maintained here since, 
in general, employees separated at the initia- 
tive of the company must have been less than 
satisfactory in some respect. Thus, the three 
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groups of employees considered were those 
still with the company after at least 1 yr., 
those separated by the company (or those 
rated unsatisfactory for the company at time 
of separation), and those who left the com- 
pany at their own initiative but whose progress 
was rated as satisfactory at the time of 
separation. 


A Description of the WCS 


The theoretical basis of the WCS builds 
particularly on the work of Herzberg, Maus- 
ner, and Snyderman (1959), which posits a 
two-level theory of motivation. The first-level 
factor is based on the objective elements of 
the job situation itself, (for example, recogni- 
tion, possibility for growth, working condi- 
tions, job security, etc.), which result in good 
or bad feelings about the job. The second- 
level factors refer to the individual’s attempt 
to relate the job situation to his own felt 
needs (e.g., need for recognition, need for 
growth, etc.). In discussing feelings of un- 
happiness, the authors suggest that it is not 
the job itself, but the conditions which sur- 
round the job which are important in deter- 
mining satisfaction. 

In developing the WCS, attention was paid 
to Herzberg’s factors, especially those which 
indicated concern with conditions surround- 
ing the job. The actual development of items 
can be found in Borgatta (1967) and Bor- 
gatta et al. (1968). The revised version of the 
WCS contains seven scores. A brief descrip- 
tion of each follows: 


1. Potential for personal challenge and de- 
velopment (8 items). This score contains items 
which measure the desire to be in situations 
where there is an opportunity for creative 
work, a chance for as much responsibility as 
one wants, and where there is an emphasis on 
originality and individual ability. 

2. Responsiveness to new demands (7 items). 
The items in this score determine the indi- 
viduals’ responsiveness to emergency situa- 
tions in the job, changing job assessments, 
and such irregular demands. 

3. Competitiveness desirability (and re- 
ward of success) (9 items). Here emphasis is 
on whether or not the individual seeks job 
situations where the salary is determined by 
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merit, competition is keen, and where there 
is emphasis on accomplishment. 

4. Tolerance for work pressure (7 items). 
This score taps attitudes toward situations 
where the work load might be excessive, where 
a person might be on call after hours or might 
even have to take work home. 

5. Conservative security (12 items). These 
items were designed to determine whether the 
individual wants to play it safe and have 
security. Items ask the importance of senior- 
ity, well-defined promotion guidelines, and 
well-defined job routines. 

6. Willingness to seek reward in spite of 
uncertainty versus avoidance of uncertainty 
(12 items). Is the person willing to do in- 
teresting work even though he might get fired 
easily? Would he work with a company with 
interesting work even though it might be a 
short-run job? These are the kinds of atti- 
tudes explored in this score. 

7. Surround concern (9 items). This score 
measures the respondent’s concern with such 
“hygienic” aspects of the job as whether the 
lighting and ventilation are good, whether co- 
workers and supervisors are nice people, and 
whether the community has adequate cul- 
tural, social, and recreational opportunities. 


Some Characteristics of the WCS 


The data from all respondents were pooled 
and reliability estimates calculated. The esti- 
mates used were alphas (Cronbach, 1951) 
and are shown in the diagonal of the matrix in 
Table 1. It will be noted that the lowest coef- 
ficient is .66, and the coefficients range to .83, 
magnitudes indicative of reasonable reliability. 

The intercorrelations for the WCS scores 
are also indicated in Table 1. For these sam- 
ples of college hires, the intercorrelations 
among Scores 1, 2, 3, and 4 are somewhat 
higher than in the previous study with college 
freshmen (Borgatta et al., 1968), especially 
among the females. While these correlations 
are higher than are desirable, an equally im- 
portant question is whether the individual 
scores differentially predict some criterion. 
As will be seen later, they do. 

While the WCS scores are designed to 
measure various facets of work motivation, 
they might be defined as personality con- 
cepts if personality were broadly defined. 
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TABLE 1 


RELIABILITIES AND INTERCORRELATIONS OF WCS Scores 


Characteristic 
1. Potential for personal challenge and development 
2. Responsiveness to new demands 
3. Competitiveness desirability (and reward of 


success) 
4. Tolerance for work pressure 





5. Conservative security 


6. Willingness to seek reward in spite of uncertainty 
vs. avoidance of uncertainty 
7. Surround concern 
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Note.—Data above the main diagonal are for 869 male college hires; below the main diagonal for 344 female college hires. 


Cronbachs alphas are in the main diagonal. 


However, it was demonstrated in Borgatta 
et al. (1968) that there is substantial content 
in the WCS which is independent of that con- 
tained in three personality inventories among 
a sample of college students. The three per- 
sonality inventories—the Behavioral Self-Rat- 
ings (BSR; Borgatta, 1964), the Self-Identi- 
fication Form (S-Ident; Borgatta, 1965), and 
the Interpersonal Orientations Form (IO; 
Borgatta & Bohrnstedt, 1968)—are short 
forms designed to tap personality content 
which is contained in longer forms. The titles 
of the scores are shown in Table 3. Based 
on data in the current report, the reliabilities 
of the S-Ident ranged from .53 to .81, the 
BSR from .63 to .91, and the IO from .55 to 
85, again reasonable values. 

The correlations between the WCS and the 
scores in the three personality forms in the 
current study replicated most of those ob- 
tained in the earlier study with college stu- 
dents indicating that there is substantial in- 
dependent content in the WCS. The correla- 
tion matrix is excluded from the current paper, 
but copies may be obtained from the senior 
author. However, it was deemed important 
also to determine whether or not the per- 
sonality measures and the WCS make inde- 
pendent contributions to the prediction of the 
evaluations made of the hires by the com- 


pany. Additionally, it needs to be shown that 
both kinds of measures (work orientations 
and personality) can account for variation in 
ratings above and beyond that accounted for 
by measured ability. This analysis shall be 
presented below, but first the authors shall 
examine the relationship of the WCS to an- 
other kind of criterion, employment status of 
the hire after 1 yr. with the company. 


WCS Scores and Employment Status at Least 
One Year After Hiring 


The company’s evaluations of new college- 
level employees were made approximately 1 
yr. after joining the company. Information 
was obtained through a central service rather 
than directly through companies; thus the in- 
formation was really available only on the 
basis of the “census” reports gathered for all 
companies. Since many factors are involved in 
reporting, data that were up-to-date were 
available only in part at the first relevant 
census, and additional information for the 
participants in the study was available only an 
additional year later. For the current com- 
parison, the final information available is re- 
ported in Table 2, which presents mean scores 
for those still with the company, those sepa- 
rated from the company at the initiative of 
the company or who were noted as having 
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TABLE 2 


WCS Meran Scores AND EMPLOYMENT STATUS AT LEAST ONE YEAR AFTER HIRING 


Separated from company 


Still with company | Company initiated 

















Characterene (n = 334) or progress un- Progress satisfactory 
satisfactory (n = 30) 
(n = 56) 
M SD M SD M SD 
1, Potential for personal challenge and de- 26.1 2.6 26.0 2.8 fe 3.1 
velopment 
2. Responsiveness to new demands 20.2 2.4 19.9 Dail 20.5 Des 
3. Competitiveness desirability (and reward 25.3 S22 24.2 3.9 25.5 3.0 
of success) 
4. Tolerance for work pressure 17.3 2.8 17.4 Le) 16.9 3.6 
5. Conservative security 15.4 4.7 15.4 4.7 14.0 4.9 
6. Willingness to seek reward in spite of un- 20.7 5.4 19.9 5.8 21.8 6.4 
certainty vs. avoidance of uncertainty 
7. Surround concern 24.2 322 24.5 3.4 24.2 3.8 














Note.—There were 19 persons of indeterminate status (leave on armed services assignment, etc.) not in the above data. 


initiated their own separation but who had 
a company rating of unsatisfactory progress 
at the time of separation (6 cases) and, fi- 
nally, additional information is available for 
30 employees who initiated their own separa- 
tion and who had a company rating of satis- 
factory progress at the time of separation. 
Examination of the profile of scores for those 
still with the company and those separated 
from the company with progress unsatis- 
factory indicates that the differences in the 
profiles are not substantial. The only differ- 
ence that occurs that would satisfy a statisti- 
cal significance test (symmetric test at .05 
level) is between those still with the company 
and those who left the company at the com- 
pany’s request on WCS Score 3, Competitive- 
ness desirability, with the latter group scoring 
lower. 

The group of 30 persons who left the com- 
pany at their own initiative may be com- 
pared with those still with the company. Two 
differences in the profile may be noted. Those 
who left the company and were rated as having 
satisfactory progress were somewhat higher 
on WCS Score 1, Potential for personal chal- 
lenge and development, and somewhat lower 
on WCS Score 5, Conservative security. How- 
ever, the sample size involved is small, and 


the results may be unstable. Indeed, neither 
of these differences would satisfy a sym- 
metric test of statistical significance at the 
.05 level. Thus, at best, whatever difference 
occurs here may merely add fuel to the often 
self-critical attitudes of a company that it is. 
losing some good men, but the facts cannot 
be confirmed in these data. Additional dis- 
cussion of such projected differences will be 
found below. 


SCAT Scores and Employment Status 


It is instructive to examine the abilities 
scores of three groups as measured by the 
School and College Abilities Test (SCAT; 
Educational Testing Service, 1964). For the 
334 cases still remaining with the company, 
the average SCAT score was 97.9. For the 
56 who were separated from the company at 
the initiative of the company, or whose records 
indicated unsatisfactory progress, the average 
SCAT score was 95.2. This difference would 
satisfy a symmetric test of statistical signifi- 
cance at the .05 level. For the 30 persons who 
left the company at their own initiative and 
whose records indicated satisfactory progress, 
the average SCAT score was 98.1. Again, 
while these differences are not large, they are 
consistent with the expectations about the 
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selective processes once one is employed. 
Persons of lesser ability who are either rated 
unsatisfactory or who are separated at the 
initiative of the company tend to have lower 
SCAT scores. Employees who separate at 
their own initiative and who are displaying 
satisfactory progress at the time of termina- 
tion appear to have characteristics similar to 
those who remain with the company. 

It should be noted that the selection process 
with SCAT is already rigorous; that is, in 
general, persons who are hired have SCAT 
raw scores of 91 or higher; they are a select 
group from the point of view of measured 
abilities. Thus, the fact that even in so select 
a group persons of higher measured abilities 
still tend to succeed more is impressive. 


Criterion Ratings as Related to WCS, SCAT, 
and Personality Scores 


In this section two sample bases for analy- 
sis are utilized. One of the determinants of 
sample size for the current data was the avail- 
ability of information concerning employ- 
ment status or availability of Initial Manage- 
ment Development Program (IMDP) ratings, 
which are to be made after 1 yr. of service 
with the company. IMDP ratings are gen- 
erally defined as the number of years esti- 
mated to be required for the employee to 
reach the third level of management. The 
first sample base is constituted of the 334 
cases for which IMDP ratings were available 
in the second census following the year of 
hiring (hereafter called “IMDP after 2 yr.” 
sample). It is not known what factors are 
associated with late reporting of IMDP rat- 
ings, but in order to get more complete in- 
formation, delay of the analysis until the 
second census was required. Since late ratings 
were probably made later in the careers of 
Ss, at least one source of error arises since 
IMDP ratings are defined in terms of time. 
Thus, if there is a delay of a year in the 
IMDP rating, all persons are not being rated 
on exactly the same basis. 

The second sample base is of 390 cases 
and includes those persons with IMDP rat- 
ings at the time of the second census and 
those persons positively identified as separated 
at the initiative of the company or at their 
own initiative under circumstances where the 
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company files indicate unsatisfactory progress 
(hereafter called “Inclusive” sample). In this 
case, the IMDP rating accorded to the em- 
ployee is a value of 8, which is also the value 
coded for persons who were given a rating 
that they would never achieve the third level 
of management. Other than this, the coded 
values ranged from 2 to 7, and it should be 
known that the implicit expectation for those 
in IMDP is that the average person should 
be ready to carry a position at the third level 
of management in 5 yr. Thus, ratings tend to 
be relatively narrow in range as well as sub- 
ject to the possible errors in terms of actual 
time when the ratings are made. 

The IMDP ratings are, from the point of 
view of a company, an early indication of 
progress and potential. Further, in the sense 
that favorable impression leads to the imple- 
mentation of favorable action—the self-ful- 
filling prophecy—the IMDP ratings should 
correspond to performance. Obviously, ex- 
amination of what actually happens will re- 
quire a subsequent follow-up, but in the 
interim the IMDP ratings may be revealing 
since they indicate what kinds of persons are 
perceived to be good by the company. 

The IMDP ratings have a number of in- 
trinsic limitations as criteria, however, which 
need to be elaborated. The restricted range 
of the ratings has already been noted, and, 
indeed, roughly 70% of ratings made indicate 
that the person will reach the third level of 
management in 4, 5, or 6 yr. Additionally, 
since the employees in this study are drawn 
from many companies, it has to be recognized 
that there are company differences, and these 
are relatively systematic biases. If in a com- 
pany it is known that it takes somewhat » 
longer to reach the third level of management 
than in another company, this will systemati- 
cally influence the ratings. Further, inde- 
pendently of merit which may be rewarded in 
other ways, persons in some occupational 
categories may take longer to reach the third 
level of management and this may be due to 
overhiring tendencies or to a lesser frequency 
of vacancies than was built into the hiring 
formula. Thus, realistic differences in the way 
the reward system operates within the com- 
pany may interfere with the effectiveness of 
IMDP ratings as a criterion. 
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A note of exact source of information is re- 
quired with regard to the IMDP ratings. At 
the first census, IMDP ratings were available 
for very few hires. At the census of 1965, in- 
formation was available for 286 persons. 
Thus, for 48 persons, the IMDP ratings be- 
came available after that census for various 
reasons. Indeed, the status of 19 persons ap- 
propriate for the study was still indeterminate 
at the time of this analysis for various rea- 
sons, such as leave of absence for the armed 
services, training, and so forth. The 334 cases 
with IMDP ratings in this part of the analy- 
sis, thus, are representative of all persons for 
whom a rating was available and who were 
still employed when the final check on data 
availability was made. Although the data are 
not presented here, the subsample of 286 
cases for whom the IMDP ratings were avail- 
able at the earlier point in time tend to show 
slightly higher relationships with the IMDP 
ratings, and the IMDP ratings have a slightly 
smaller standard deviation. Thus, even though 
the range of the ratings may be a little 
smaller at that earlier point in time, they ap- 
peared to bear slightly more relationship to 
the variables which are being used to predict 
the IMDP ratings. This might mean that the 
earlier set of ratings represents a more homo- 
geneous set of standards and a more reliable 
criterion. However, these small differences 
will be ignored and the larger sample utilized. 

IMDP ratings are less favorable the larger 
the value (since the value is the number of 
years estimated before readiness to perform 
well at the third level of management). Thus, 
in our analysis things that are negatively 
correlated to IMDP ratings are associated 
with the desirable side of the continuum. 

Table 3 presents the correlations of IMDP 
ratings with WCS, SCAT, and personality 
scores which are derived from the forms com- 
pleted voluntarily by the participants in this 
research. 

Table 3 contains both zero-order and mul- 
tiple correlations as well as the regression 
equations for estimating IMDP ratings from 
the WCS scores, IO scores, and personality 
measures. The table is broken into two parts 
(Columns 1-4 and 5-8) to coincide with the 
definitions of the IMDP after 2 yr. and Inclu- 
sive sample described earlier. Columns 1 and 
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5 contain the zero-order correlations between 
the 25 independent variables and IMDP rat- 
ings for the two samples. Columns 2 and 6 
contain the standardized regression coefficients 
(SRC) for estimating IMDP scores from the 
WCS alone. Columns 3 and 7 show the SRCs 
for the regression of IMDP ratings on the 
WCS and SCAT scores. Finally, Columns 4 
and 8 indicate the SRCs for estimating IMDP 
ratings from all scores (i.e., WCS, SCAT, and 
the three personality measures). The multiple 
correlations associated with each column are 
at the bottom of Table 3. 

In this discussion the data for the IMDP 
after 2 yr. sample will be used as the point 
of departure. The data for the larger sample 
which includes also those no longer with the 
company who left earlier at the initiation of 
the company or with a notation on their 
records of unsatisfactory progress (m = 390) 
tend to be parallel to those of the smaller 
sample. It is left to the reader to examine the 
exact differences. In general, slight differences 
exist with a little less of the variance in IMDP 
ratings explained in the larger sample. 

The largest single zero-order correlation 
coefficient with the IMDP ratings is WCS 
Score 3, Competitiveness desirability. Ap- 
parently, persons who indicate at the time 
they are employed that they want to get into 
situations where they have a chance of being 
rewarded for success and where they have an 
opportunity to show their abilities competi- 
tively are those who are receiving IMDP 
ratings of a fewer number of years, that is, 
are predicted to reach the third level of man- 
agement earliest. 

The second largest correlate with the TMDP 
ratings is Behavioral Self-Rating Responsi- 
bility (BSR), the content of which is orienta- 
tion to task completion, responsibility, and 
conscientiousness. 

Since there are a fair number of zero-order 
relationships beyond those discussed, examina- 
tion of these will be left to the reader. The 
authors now proceed to the regression analysis 
examining the standardized regression coef- 
ficients (SRC) instead of the zero-order cor- 
relations. 

Using the seven WCS scores only (Column 
2), the multiple R is .31. This means that a 
large amount of variance is left unexplained, 
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but the circumstances under which predic- 
tion is occurring should be recalled as highly 
restrictive. For example, the sample is re- 
stricted in intelligence and motivation when 
hired. Further, the sample represents those 
who are succeeding after at least 1 yr. Under 
these circumstances, it can reasonably be as- 
serted that any prediction is interesting and 
important. As Cronbach (1960, pp. 348-351) 
indicates, assessments which have relatively 
low correlations can be very beneficial to a 
company in its hiring if the selection ratio is 
not high (i.e., if the company has an ample 
supply of applicants), and if individual dif- 
ferences in the ability to perform the job are 
large. Cronbach notes that ‘coefficients as 
low as .30 are of definite practical value 
[p. 349].” 

There are two statistically significant re- 
gression coefficients. First, WCS Score 3, 
Competitiveness desirability, has a standard- 
ized regression coefficient of .12. Thus, Con- 
servative security is involved in a negative 
way in the assessment of performance, al- 
though the involvement is quite small. 

Although not shown in Table 3, the im- 
portance of the motivational information may 
be judged in contrast to performance on 
SCAT scores. For the two subscores of SCAT 
(which are a more efficient predictor than the 
total score), the multiple R is .16, and virtu- 
ally all of the variance is accounted for by the 
verbal score. Again, here it has to be em- 
phasized that the group is explicitly select 
from the point of view of measured ability, 
not only in the general use of a SCAT score 
of 91 or higher in hiring, but also because 
those who have dropped out of the sample 
to this point have statistically significant 
lower SCAT scores. The persistent relation- 
ship of SCAT to the IMDP criterion is of 
particular interest from the point of view of 
validity of selection procedures. Since the 
motivational criteria in selection are less 
formally applied, essentially there is more 
opportunity for a relationship to be observed. 

From the multiple R resulting from the 
regression of IMDP from both the seven WCS 
scores and the two SCAT scores, it is demon- 
strable that the predicted variance from the 
two types of information is relatively inde- 
pendent since the standardized regression co- 
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efficients remain almost identical in the com- 
bined computation. Further, the coefficient 
of determination of the overall multiple R is 
only .004 less than the sum of the two separate 
coefficients of determination. 

It should be noted that the difference be- 
tween R=.16 (the R based only on the 
SCAT scores) and R= .31 is significant at 
the .05 level by an F test indicating that the 
WCS does add predictive variance in IMDP 
ratings above and beyond that explained by 
SCAT scores above. 

In order to facilitate interpretation of the 
relationship of the WCS, SCAT, and personal- 
ity scores with regard to the IMDP ratings, a 
multiple R was computed, involving all these 
scores. These standardized regression coef- 
ficients are also to be found in Table 3 (Col- 
umns 4 and 8). The standardized regression 
coefficients that would be judged statistically 
significant, if one were testing hypotheses 
using a symmetric test with a= .05, are 
italicized. Since a large number of variables 
are involved in the prediction, questions 
about the uniqueness of the sample are im- 
portant in any interpretation. One needs to 
be relatively cautious, in other words, and 
some of the findings should be interpreted as 
of speculative interest rather than as hard 
facts. Still, it is of interest to see what is 
happening. Some of the relationships that 
might have been statistically significant as 
first-order r’s disappear. Other zero-order re- 
lationships which were not visible (sup- 
pressed) become visible. For example, the 
S-Ident intellectual orientation appears to in- 
crease in magnitude and become significant in 
the prediction equation. That is, concern with 
intellectual matters (in the sense of interests 
or activities that are normally classed as in- 
tellectual in nature) appears to be nega- 
tively associated with a favorable IMDP rat- 
ing. The BSR responsibility score is posi- 
tively associated with a favorable IMDP 
rating, and it should be noted that the SCAT 
verbal score appears to have greater promi- 
nence in the multiple R involving more vari- 
ables. Apparently, then, some personality 
characteristics are acting as suppressors mask- 
ing the relationship of SCAT and IMDP, 
which is stronger when the personality char- 
acteristics are taken into account. 
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TABLE 3 


MULTIPLE CORRELATION OF WCS, SCAT, anp PERSONALITY SCORES WITH IMDP RatINGs 





Inclusive sample (n = 390) 


IMDP after 2 Yr. (n = 334) 








Characteristic ; 
r 
WCS 
Potential for personal challenge and 
development 01 
Responsiveness to new demands ——03 
Competitiveness desirability (and reward 
of success) — .248 
Tolerance for work pressure — .04 
Conservative security alk 
Willingness to seek reward in spite of 
uncertainty —.09 
Surround concern lel 
SCAT 
Verbal score lS 
Quantative score — .09 
S-Ident 
Leadership rel 
Impulsivity mitt 
Intellectual orientation .00 
Aloofness .10 
Self-depreciation and low morale a5 
Lack of tension —.09 
BSR 
Assertiveness —.14 
Likeability = .13 
Intelligence —.14 
Emotionality .06 
Responsibility —.19 
IO 
Independence-autonomy — .04 
Social dependency 04 
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Sociability — .03 
“T tend to be dependent on others.” 15 
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8 These standardized regression coefficients would be statistically significant in a symmetric hypothesis test witha = .05. 


Thus, the major variables of first-order cor- 


relations with IMDP ratings persist when the 
standardized regression coefficients are ex- 
amined. The multiple R in this examination 
which involves 25 variables is .43. Roughly 
19% of the variance of the IMDP ratings is 
accounted for. Considering the restrictions 
placed on the criteria and on the prediction 
variables as noted earlier, this is substantial 
prediction. 

The increase in R which occurs when the 
16 personality variables are added to the WCS 
and SCAT scores appears to be real rather 


than a function of the number of predictors 
since an F test indicates that the difference 
between R= .43 and R = .34 is significant 
at the .05 level. 


SUMMARY AND CONCLUSION 


First, data based on new college-level em- 
ployees are reported which indicate the re- 
liability of the WCS. The structure of the 
WCS is explored for the samples and sug- 
gests that some scores are slightly more cor- 
related with one another than was reported 
in a prior study. 


Work COMPONENTS STUDY 


Second, examination of the new employees 
for whom all types of data were available 
suggests that there are no gross differences on 
motivational bases as measured by the WCS 
between those who remain with the company 
and those who leave the company at the initia- 
tive of the company, or who are rated as 
having unsatisfactory progress but leave of 
their own initiative. A statistically significant 
difference does occur, however, with the em- 
ployees still with the company higher on 
WCS Score 3, Competitiveness desirability, 
than those who left at the company’s request. 
A group of Ss who left the company at their 
own initiative and who were rated as showing 
satisfactory progress at the time appear to 
be somewhat higher on WCS Score 1, Poten- 
tial for personal challenge and development, 
and lower on WCS Score 5, Conservative 
security, but the differences are not statisti- 
cally significant. If such differences, which are 
the kinds of self-critical expectations that 
frequently arise in large organizations, were 
true, projection of the current data would 
indicate that the differences would still be 
extremely small. The SCAT scores of the 
employees with ratings of unsatisfactory prog- 
ress who leave at the initiative of the com- 
pany are lower than those still with the com- 
pany. Those who leave the company at their 
own initiative and with ratings of satisfactory 
progress have almost identical SCAT scores 
as those still with the company. Since the 
SCAT scores of these latter two groups are 
the same, and since it was noted subsequently 
that there is no difference between these two 
groups on WCS Score 3, Competitiveness de- 
sirability, the meaningfulness of the differ- 
ences between the two groups on the other 
scores, even if they did exist, is brought into 
question. While it is true that there is a pos- 
sible small involvement between a high score 
on WCS Score 5, Conservative security, and 
a poor rating on the criterion, the association 
is relatively small and appears to fade in the 
presence of a larger number of variables. At 
least in these data, then, there is no indication 
that persons with extraordinary ability or 
motivation are leaving. 

Third, when Initial Management Develop- 
ment Program ratings are used as a criterion, 
it is clear that the hires who score highest 
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on WCS Score 3, Competitiveness desirability, 
are those perceived by the company as likely 
to reach the third level of management the 
most quickly. It has a larger regression weight 
than the SCAT abilities score although the 
variance in the latter score is attenuated be- 
cause no one with SCAT scores less than 91 
was hired. Also noted was a significant regres- 
sion coefficient indicating that individuals who 
score high on BSR responsibility are also 
perceived as moving up quickly. A smaller 
negative relation between this perception and 
S-Ident intellectual orientation was also noted. 
Given the results of this study, it apparently 
is not the “organization man” who does not 
“rock the boat” who is perceived as likely to 
be promoted quickly, but the highly mo- 
tivated, intelligent, and responsible individual] 
who is. 

While the sample sizes and the investment 
in data collection are substantial in the study 
reported, any practical implications must be 
advanced quite tentatively. Obviously, the 
setting is clear for additional studies to am- 
plify and to explore additional aspects of the 
problem. For example, hard criteria of suc- 
cess and failure will develop not only for the 
subsample discussed in this analysis, but for 
the more inclusive sample for which data were 
collected through the questionnaire. That is, 
as a natural consequence of the application 
of the reward systems of the company, better 
criterion data will become available for analy- 
sis of the WCS, SCAT, and personality scores, 
and for the understanding of the progress 
through which people proceed in their careers. 

As noted above, however, it is too early 
to suggest that implications should be drawn 
seriously from the current research. What is 
emphasized is that the stability of the WCS 
scores and their apparent involvement in 
preliminary study with the IMDP criterion 
as reported here suggest need for more de- 
tailed study with hard criteria and more sub- 
stantial samples. Equally, the study suggests 
that the multiple-factor base of the WCS 
militates for extreme caution in more simplistic 
and ad hoc theories of work motivation and 
orientation. Finally, the persistent relationship 
of SCAT to a criterion like IMDP suggests 
the importance of general abilities in selec- 
tion. Hopefully, further research on motiva- 
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tional factors will lead to equally consistent 
results. 
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INFLUENCE OF SEX ROLES ON THE MANIFESTATION 
OF LEADERSHIP’ 
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Situational factors influencing the manifestation of dominance (Do) were 
investigated by pairing Ss high and low on the CPI Do scale and having 
them interact in tasks in which one had to lead and the other follow. In 
experiments using a masculine industrial task and a sexually neutral clerical 
task, the following S pairs were studied: High and Low Do men (Group 1), 
High Do men and Low Do women (Group 2), High Do women and Low Do 
men (Group 3), High and Low Do women (Group 4). Assumption of leader- 
ship by the High Do women in Group 3 was significantly lower in both 
studies. This was attributed to sex role conflict inhibiting the manifestation of 
Do. Analyses of the decision-making process supported this interpretation. 


In a recent study in this journal, Megargee, 
Bogart, and Anderson (1966) reported the 
results of an attempt to predict leadership in 
a simulated industrial task using the Domi- 
nance scale of the CPI. They found that when 
men who were high in dominance (High Do) 
were paired with men who were low in 
dominance (Low Do) and exposed to a situa- 
tion in which one had to act as the leader 
and the other as the follower, the High Do 
individual assumed the leader role 90% of 
the time when the instructions stressed leader- 
ship, but only 56% of the time when the in- 
structions did not stress leadership, They con- 
cluded that “the conditions under which 
leadership is to be exercised are as important 
as the personality trait of dominance in de- 
termining whether or not dominant behavior 
will be manifested [p. 295].” They suggested 
that further research should be undertaken 
in an effort to determine the situational fac- 
tors which facilitate or inhibit the overt ex- 
pression of the trait of dominance, 

In addition to its intrinsic theoretical in- 
terest, this problem is particularly important 
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for the applied psychologist. Knowing how 
personality traits interact with social situa- 
tions will enable him to predict behavior 
more accurately. Moreover such data would 
help him design settings which would allow 
the High Do individual to express his leader- 
ship ability to its fullest. Research by Smel- 
ser (1961) has shown that when two people 
must cooperate on a task, the productivity or 
achievement of the group is highest when the 
High Do individual is the leader and the Low 
Do individual the follower. If certain factors 
inhibiting the High Do individual from as- 
suming the leader role can be identified, then 
steps might be taken which would remove 
these impediments, thereby increasing the 
achievement level of the group and the job 
satisfaction of its members. 

The studies to be reported in the present 
paper investigated how social sex role pre- 
scriptions influence the expression of leader- 
ship by High Do men and women. In our 
society it is generally considered appropriate 
for men to dominate women but not vice 
versa, Most managerial or executive positions 
are held by men, and while women do not 
usually feel uncomfortable working for men, 
men may feel quite discomfitted working at 
the direction of women. It seemed likely that 
these social role prescriptions would act to 
inhibit High Do women from assuming lead- 
ership when paired with Low Do men, but 
might facilitate the assumption of leadership 
by High Do men paired with Low Do women. 
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Stupy I 
Method 


Subjects. A 113-item test labelled the Gough In- 
ventory which consisted of all the items from the 
CPI Do, Cm, and Gi scales was administered to ap- 
proximately 600 students in introductory psychology 
classes at the University of Texas, Austin. From this 
pool of Ss, four groups were formed with 20 pairs 
of Ss in each. Group 1 consisted of High Do men 
paired with Low Do men, Group 2 of High Do 
men paired with Low Do women, Group 3 of High 
Do women and Low Do men, and Group 4 of High 
Do women and Low Do women. The Ss in each 
pair were at least 20 T score points apart on the 
Do scale. 

Apparatus. In their original study, Megargee et al. 
(1966) used a large box which rested on its side in 
such a way that the follower had to crawl into it on 
his hands and knees. While this menial position was 
well suited to an all-male study such as theirs, it 
was necessary to modify the apparatus somewhat 
before it could be used by women wearing skirts and 
stockings. Therefore, the box was placed in an 
upright position so that it resembled a large tele- 
phone booth without a door. Midway up the side 
opposite the entrance, 100 $-in. holes were drilled 
2 in. apart in a 10 X 10 square pattern. Each hole 
was filled with a slot-headed bolt 1 in. long and ¢ in. 
in diameter with the slotted head and a washer in- 
side the box, and a square nut tightly screwed onto 
the bolt outside the box. Because of the narrowness 
of the 4-in. bolt relative to the $-in. hole, the only 
way the nut on the outside could be unscrewed ef- 
ficiently was for one partner to enter the box and 
hold the bolt in place with a screwdriver while the 
other partner remained outside and unscrewed the 
nut with a wrench. The size of the box precluded one 
person manipulating both the wrench and the screw- 
driver simultaneously. 

Five of the nuts were painted red, 20 were painted 
yellow, 25 were painted green, and 50 were un- 
painted. The colors were randomly distributed around 
the grid. None of the bolts or washers inside the 


TABLE 1 


Number or Hicu Do Ss Assuminc LEADER AND 
FOLLOWER ROLES IN THE Two STUDIES 








Group 
Study I* 


No. assuming leader role 15 18 4 14 


No. assuming follower role 5 2 16 6 
N 20 20 20 20 
Study II> 
No."assuming leader role 10. qa) 4 ie 
No. assuming follower role 6 A AW, 4 
N 16 16 16 16 
4x2 = 23.96, p <.001. 
bx? = 14,94, p < .005. 
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box were painted, so the person on the inside had 
no way of determining the location of the different 
colored nuts. 

Procedure. After the pairs of Ss had been formed, 
the individuals were contacted by phone and a time 
arranged when both could come to the laboratory. 
At the appointed time each pair was led to the 
room in which the apparatus was set up. The fol- 
lowing instructions were then read with the italicized 
words emphasized: 

This is a study of the relation between the 
Gough Inventory and leadership under stress. This 
box represents a machine and you are a team of 
troubleshooters who are to repair it in the fastest 
possible time. The repair that must be made is to 
remove all the yellow nuts, leaving the red, green, 
and unpainted ones in place. One person, who is 
the leader, is to stay outside in front of the 
machine and the other, who is the follower, must 
go inside. The leader must locate the yellow nuts, 
call out their location to the follower, and remove 
them using this wrench. The follower must obey 
the leader’s commands and, using this screwdriver, 
hold the bolts in place while the leader removes 
the nuts. It is up to you to decide who will be the 
leader and who will be the follower. 

Any questions? OK. I shall start timing you 
now. 


Results 


The results are presented in the upper por- 
tion of Table 1. In Groups 1 and 4, in which 
both partners were of the same sex, 75% of 
the High Do men and 70% of the High Do 
women took the leader role. This replicated 
the findings on the predictive validity of the 
Do scale for men obtained by Megargee, Bo- 
gart, and Anderson (1966) and extended them 
to women. 

In Group 2, High Do men were paired with 
Low Do women. With differences in dominance 
and social role expectations both operating in 
the same direction, the manifestation of 
leadership by the High Do men was fa- 
cilitated and 90% assumed the leader role. 
In Group 3, in which High Do women were 
paired with Low Do men, dominance con- 
flicted with sex role. As expected, this in- 
hibited the assumption of leadership by the 
High Do women. Only 20% assumed the 
leader role over the Low Do men. These dif- 
ferences between the four groups were highly 
significant (x? = 23.96, p < .001). 


Discussion 


These results clearly indicated that social 
role conflict could seriously inhibit the ex- 
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pression of leadership by High Do women. 
This raised two questions: 

1. How generalizeable were these findings? 
The simulated industrial task used in this 
study was an extremely masculine one on 
which it would be natural for women to defer 
to men. Could similar results be obtained on 
a sexually neutral task? 

2. Were the differences observed in Group 
3 the result of increased assertiveness by the 
Low Do men or of greater submissiveness on 
the part of the High Do women? 

To answer these questions a second study 
was designed and carried out. 


Stupy II 
Method 


Subjects. Students enrolled in introductory psy- 
chology classes at the University of Texas, Austin, 
were pretested with the Gough Inventory and as- 
signed to groups in the same fashion as in Study I 
except that 16 rather than 20 pairs of Ss were used 
in each group. 

Apparatus. The purpose of this study was to 
replicate Study I using a sexually neutral two-person 
task with well-defined leader and follower roles and 
instructions which emphasized leadership. After some 
thought it was decided to use a simulated clerical 
situation with the leader dictating to the follower. 
The setting was designed to emphasize the difference 
between the leader and follower roles. It consisted of 
a table with a screen in the middle and chairs on 
either side. The leader’s chair was an executive-type 
swivel chair with arms. In front of it rested a 
leather-covered loose-leaf binder containing the ma- 
terial to be dictated. A sign on the screen above the 
binder read “LEADER’S SIDE.” The follower’s 
chair was a straight-backed wooden chair. He had 
no binder for his papers and on his side of the 
screen a sign read “FOLLOWER’S SIDE.” 

The leader’s notebook contained a page from the 
Stroop Color-Word Test (Stroop, 1935) which had 
the names of four colors, each printed in ink of a 
different color, such as the word “red” printed in 
blue ink. The follower was supplied with a mimeo- 
graphed form on which the initial letters of the four 
color names were printed, once for each word on the 
leaders’ sheet. 

Procedure. Each pair of Ss was led to the testing 
room where the following instructions were read 
with the italicized words emphasized: 


This is a study of the relation between the 
Gough Inventory and leadership under stress. One 
important aspect of leadership is the ability to 
concentrate, remain calm, and accurately give 
directions to a subordinate. This is what is re- 
quired by this task. The leader will have to absorb 
information and rapidly pass it on to the follower. 
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The information to be transmitted is the color 
of the ink on a series of words. This is the 
leader’s side of the table. On the table is a sheet 
of paper on which are printed the names of four 
colors: blue, green, orange, and red, as you can 
see in the Jeader’s sample. Four different colors of 
ink, blue, green, orange, and red, have been used 
in printing the color names. As you can see, the 
word “orange” is printed in blue ink. The job of 
the leader is to tell the follower the color ink used, 
ignoring the printed word. Since the first word is 
printed in blue ink, the leader should call “Blue”; 
since the second word is printed in orange ink, he 
should call out “Orange,” and so on. 


[E moves around table.] 


This is the follower’s side of the table. The 
follower must record the information given him by 
the leader. The follower’s sheet of paper has the 
letters “B,” “G,” “O,” and “R” printed, once for 
each word on the leader’s sheet. As the leader calls 
out the colors of the ink, the follower must record 
the information by crossing out the letter cor- 
responding to the color, B for blue, G for green, 
O for orange, and R for red. 

This sample has been filled out to correspond to 
the leader’s sample. Remember the first word on 
the leader’s sample was ‘Orange” printed in blue 
ink; therefore the leader should have said “Blue” 
and the follower should cross out the letter “b.” 
[If not clear give more examples. | 

Thus the leader must transmit as much in- 
formation to his subordinate as possible in the 
time allowed, as if giving orders in a crisis. The 
follower must record what his leader tells him. 

It is up to you to decide who will be the 
leader and who will be the follower. When you 
have decided, take your seats at the leader or fol- 
lower’s sides of the table. When the leader is sure 
the follower is ready to record the information, he 
should call out “Start,” turn over the sample page, 
and start calling out the names of the different- 
colored inks. When the leader calls out “Start,” I 
shall begin timing you; the leader will have 90 
sec. to transmit as much information as possible. 

Any questions? 

OK. Decide who will be leader and who will be 
follower and take your positions. 


After reading the instructions, # turned on a tape 
recorder which recorded the discussions which pre- 
ceded the choice of leader. Later E transcribed these 
records, noting any additional nonverbal behavior 
such as an § simply sitting down in one position or 
shrugging when asked his opinion. 


Results and Discussion 


The data regarding leadership choice are 
presented in the lower section of Table 1. 
Once again significant differences were ob- 
tained (x? = 14.94, p < .005). The propor- 
tion of High Do Ss in each group assuming 


380 






STUDY 1 


PERCENT HIGH DO Ss ASSUMING LEADER ROLE 


GROUP 


Fic. 1. Proportion of High Do Ss assuming leadership 
in Studies I and II. 


the leader role in the two studies are plotted 
together in Figure 1. It can be seen that the 
two studies yielded virtually identical re- 
sults. The decrease in the manifestation of 
leadership by High Do women when paired 
with Low Do men is therefore not limited to 
highly masculine tasks such as that used in 
Study I. 

Next, the records and notes of the decision- 
making process were analyzed to determine 
whether this phenomenon was the result of 
increased submissiveness on the part of the 
High Do women or of greater assertiveness 
on the part of the Low Do men. It was first 
noted which S made the final decision. Next 
the behavior of the partner who made the 
decision was studied to determine whether he 
appointed himself or his partner leader. Those 
who appointed themselves leaders were fur- 
ther divided into those who said in effect, 
“T’ll be leader,” and those who said, “You be 
the follower.” Similarly those who appointed 
the other partner leader were subdivided into 
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those who said, “You be leader,” and those 
who said, “I'll be follower.” This third level 
of analysis did not prove to be useful in dif- 
ferentiating the groups. The behavior of those 
who allowed their partners to decide was also 
examined and a record kept of those who ac- 
tively said, “You decide,” and those who 
simply remained acquiescent or said nothing. 
This, too, did not prove enlightening. 

Interesting group differences were found in 
the analysis of which partner made the final 
decision and, more important, whether he 
decided he or his partner should be leader. 
Examination of these data in Table 2 shows 
that Groups 1, 2, and 4, in which dominance 
did not conflict with social role, followed quite 
similar patterns. The first row in Table 2 
shows the proportion of High Do Ss in each 
group who actually made the final decision. 
It can be seen that the High Do Ss in these 
three groups often let the Low Do partner 
make the decision. The major difference be- 
tween the High and Low Do Ss was not in 
who made the decision, but in the nature of 
the decision which was made. The second row 
in Table 2 shows that when the High Do S 
in these three groups made the decision, he 
usually appointed himself leader. However, 
when the Low Do S in these groups made the 
decision, he generally appointed his partner 
leader. (In many of these cases the High Do 
S had indicated verbally or nonverbally that 
while the decision was up to the Low Do 
partner, the High Do S would not be averse 
to assuming leadership. ) 

In Group 3, in which the pairing of High 
Do women with Low Do men brought domi- 
nance and sex role into conflict, there was a 
major difference in the behavior of the High 
Do Ss but not in the behavior of their Low 
Do partners. The High Do women made the 
final decision more often than the High Do Ss 
in any other group, and 91% of the time they 
appointed their Low Do male partners as 
leader. This was in marked contrast to the 
behavior of the High Do Ss in other groups 
who, if they made the final decision, never 
selected their partner as leader more than 33% 
of the time. 

On the other hand, a comparison of the 
behavior of the Low Do men in Group 3 with 
that of the Low Do Ss in the other three 
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TABLE 2 


NuMBER oF Ht anp Lo Do Ss in Each Group MAKING FINAL DECISION AND 
THE NATURE OF THE DECISION MADE 























Group 
Ss 

1 2 3 4 
n 9 8 11 6 

Hi Do ee 
% 56 50 69 38 

No. of Ss making decision eae ee ee eee 
n 7 8 5 10 

Lo Do ees eee 
% 44 50 31 62 
n 6 7 1 4 

Hi Do —<—<—$—$—— | 
% 67 88 09 67 

No. of Ss appointing | 
self leader n 3 1 2 2 

Lo Do 

% 43 12 40 20 
n 5 1 10 2 

Hi Do ee | 
Gq, 33 12, 91 33 

No. of Ss appointing | 
partner leader nN 4 7 3 8 

Lo Do ee 
% Sf 88 60 80 

















groups shows no difference. When the Low 
Do men in Group 3 made the final decision, 
they appointed themselves as leader 40% of 
the time and their High Do women partners 
as leader 60% of the time. This was almost 
identical to Group 1 in which the Low Do 
men appointed themselves as leader 43% of 
the time and their High Do male partners 
57% of the time. 

This analysis of the decision-making proc- 
ess thus indicated that the low incidence of 
High Do women leaders in Group 3 was not 
the result of greater assertiveness-by the Low 
Do men but instead of reluctance by the High 
Do women to assume overt leadership over a 
male partner. 

The first implication of these two studies 
is that when predicting leadership, counselors 
should consider not only dominance and the 
saliency of leadership in the situation (Megar- 
gee et al., 1966), but also the effect of social 
roles on the overt expression of dominance. 


Recent research by Fenelon (1966) has dem- 
onstrated that further research is necessary 
to determine which social situations inhibit 
or facilitate the expression of dominance. 
Fenelon studied pairs of High and Low Do 
white and Negro coeds using the clerical task 
described in Study II. In the biracial situa- 
tion, Fenelon found that, contrary to expecta- 
tion, the Negroes took the leader role twice 
as often as the white girls no matter what 
the relative Do scores. Further research is 
currently underway by Carl Rubinroit to 
determine if similar patterns are to be found 
among Anglo, Negro, and Mexican-American 
boys in a thoroughly integrated lower class 
high school. 

One might also infer from the present data 
and from studies such as that of Smelser 
(1961) that the reluctance of High Do women 
to assert leadership over Low Do men would 
result in less productivity and job satisfaction 
on the part of both partners. However, re- 
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search to date has generally overlooked hetero- 
sexual work groups, and such studies must be 
done before this conclusion can be reached 
validly. Investigations comparing the effective- 
ness of groups made up of High and Low Do 
men and women in both leader and follower 
roles should employ both situations in which 
Ss select the leader and groups in which the 
leader and follower roles are assigned. It is 
quite possible that the High Do woman might 
function more effectively when she is ap- 
pointed leader than when she must assume 
the leader role on her own initiative. If 
such research demonstrates that heterosexual 
groups function better when the High Do 
partner is the leader, then further studies 
should be undertaken to determine ways in 
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which the reluctance of the High Do woman 
to assume leadership can be overcome. 
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MEASURE OF MOTIVATION 


ROBERT B. EWEN } 
New York University 


Failure to prepare for the Psychology Achievement Test of the Graduate 
Record Examination (GRE-P) may indicate low motivation. Therefore, 
GRE-P may serve as an unobtrusive measure of motivation necessary for 
success in graduate school whether or not the content of the test taps abilities 
necessary for success. To test this hypothesis, records of 31 males enrolled in 
various graduate psychology programs at NYU dating from 1960 were ob- 
tained. Predictors included GRE-P, GRE Verbal and Quantitative Aptitude 
Tests, Miller Analogies Test (MAT), undergraduate overall and under- 
graduate psychology grade-point average (GPA), and number of psychology 
courses taken prior to the GRE. Criteria included percentage of “A” grades in 
graduate school and graduation versus termination. Only GRE-P and a 
difference score consisting of GRE-P minus MAT showed significant validity 
against the criteria. The results were interpreted as supporting the hypothesis. 


The high cost of errors in the selection of 
students for admission to graduate school in 
psychology underlines the need for valid selec- 
tion procedures. As is the case in any selec- 
tion situation, acceptance of an applicant 
who fails to succeed involves considerable 
waste in time and expense on the part of both 
the institution and the applicant, while re- 
jection of an applicant who would have 
succeeded deprives the institution of a useful 
member and the applicant of a position which 
he merits. The latter type of error is difficult 
to research because the necessary conditions, 
wherein all candidates are accepted regardless 
of test scores, are rarely found in practice; 
this paper will therefore be devoted to the 
problem of acceptance of unsuccessful candi- 
dates. 

A recent monograph by Webb, Campbell, 
Schwartz, and Sechrest (1966) deals with the 
use in social science research of unobtrusive 
measures, where data are not obtained by 
interview or questionnaire. The authors pre- 
sent examples both from fiction and from real 
life: 


1 The author is indebted to Joseph Weitz for his 
assistance throughout the course of this study, and 
to Abraham K. Korman for comments and sug- 
gestions. Requests for reprints should be sent to the 
author, Department of Psychology, New York Uni- 
versity, 21 Washington Place, Third Floor, New 
York, New York 10003. 


The singular Sherlock Holmes had been reunited 
with his friend, Dr. Watson, . . . and both walked 
to Watson’s newly acquired office. The practice 
was located in a duplex of two physician’s suites, 
both of which had been for sale. . . . Holmes 
summarily told Watson that he had made a wise 
choice in purchasing the practice that he did, 
rather than the one on the other side of the duplex. 
The data? The steps were more worn on Watson’s 
side than on his competitor’s [p. 35]. 

A Chicago automobile dealer, Z. Frank, esti- 
mates the popularity of different radio stations 
by having mechanics record the position of the 
dial in all cars brought in for service. . . . These 
data are then used to select radio stations to 
carry the dealer’s advertising [p. 39]. 


Webb et al. (1966) raise cogent criticisms 
concerning the frequent use of interviews and 
questionnaires in social science research: 


We lament this overdependence upon a single, 
fallible method. Interviews and questionnaires 
intrude as a foreign element into the social setting 
they would describe, they create as well as measure 
attitudes, they elicit atypical roles and responses, 
they are limited to those who are accessible and 
will cooperate, and the responses obtained are 
produced in part by dimensions of individual dif- 
ferences irrelevant to the topic at hand. But the 
principal objection is that they are used alone 
fip..1]: 


An objective test would not ordinarily be 
considered an unobtrusive measure; in fact, 
it is usually just the opposite. In exceptional 
instances, however, a written test may act as 
an unobtrusive measure. The Michigan Vo- 
cabulary Profile Test measures vocabulary 
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level in human relations, commerce, govern- 
ment, physical sciences, biological sciences, 
mathematics, fine arts, and sports. However, 
scores of the various subparts are not highly 
intercorrelated, indicating the absence of a 
general verbal comprehension factor. There- 
fore, Guion (1965, p. 313) has classified this 
test as a disguised interest test which reflects 
the fact that an individual is more likely to 
learn the jargon of fields of activity in which 
he is interested, rather than as a test of 
verbal ability. 

A consideration of GRE-P suggests that it 
may well serve as an unobtrusive measure of 
the motivation and interest of the applicant. 
The GRE-P is an achievement test which 
consists of objective multiple-choice items 
dealing with various areas of psychology, and 
is intended for selection at the graduate school 
level. Students know that this examination is 
not required by all universities, and hence an 
applicant to a university which does require 
this test is likely to think that the university 
regards the result as important (whether or 
not this is in fact true). 

Let us don the guise of Sherlock Holmes 
and attempt to deduce the behavior of stu- 
dents with high ability and varying degrees 
of motivation.? A highly motivated and able 
student is likely to reason that this test will 
have at least some effect on his chances of 
entering the graduate school of his choice, 
and should therefore take steps to prepare 
himself for it. Since GRE-P is an achievement 
test for which studying is likely to have an 
effect, and since the student is assumed to be 
an able one, preparation should result in a 
high score. In graduate school, the combina- 
tion of high motivation and high ability 
should produce success. Therefore, high 
GRE-P scores will be related to success in 
graduate school through the underlying vari- 
able of motivation. 

A student with high ability but low motiva- 
tion, however, is unlikely to engage in be- 
havior involving preparation for GRE-P. The 
GRE-P is sufficiently comprehensive so that 
lack of preparation will prohibit a high 
score for many Ss, despite high ability. In 


2Possible alternative explanations will be con- 
sidered in the Discussion section. 
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graduate school, where motivation as well as 
ability is necessary to produce “A” grades and 
an acceptable dissertation, the lack of motiva- 
tion is likely to have a debilitating effect on 
performance that may well not have been 
apparent at lower educational levels. Con- 
sequently, low GRE-P scores will be related 
to lack of success in graduate school. 

Thus, whether or not GRE-P is a valid 
predictor in the ordinary sense of tapping 
abilities necessary for success in graduate 
school, it may well be useful as an unobtrusive 
measure of motivation. The present study was 
designed as a first step towards evaluating 
this hypothesis. 


MeETHOD 
Subjects 


Records of 31 males enrolled in the graduate psy- 
chology program at New York University no earlier 
than the fall of 1960 were obtained. Since sex dif- 
ferences might well be operating in this situation, 
it was decided to reserve an investigation of female 
Ss for a subsequent study. Including students from 
far in the past was inadvisable because this would 
involve dealing with selection standards that were 
out of date and no longer of interest; the year 
1960 was selected as the cutoff point because it was 
the most recent point in time that would yield a 
sample of 30 or more Ss. Since the focal point of 
the study was GRE-P, any student enrolled in 
1960 or later but for whom GRE scores (or criterion 
scores) were missing was excluded from the study. 

Sixteen Ss successfully completed the program 
and received the PhD degree; 15 were dropped by 
NYU or withdrew without either receiving this 
degree or transferring to a graduate psychology 
program at another university with a favorable 
recommendation from NYU. At the time in question, 
there were four graduate psychology programs at 
NYU: social, experimental, industrial, and clinical. 
Of the graduates, 1 was in the social program, 6 
were in experimental, 3 were in industrial, and 6 
were in clinical. Of the terminators, 2 were in social, 
6 were in experimental, 5 were in industrial, and 
2 were in clinical. This sample somewhat relatively 
underrepresents the social and clinical programs and 
overrepresents the experimental program (the termi- 
nation ratios should not be presumed to reflect the 
norm of any program), but it does at least include 
some cross section of the four programs at NYU. 


Instruments 


The predictors used in this study were the Verbal 
(GRE-V), Quantitative (GRE-Q), and Psychology 
(GRE-P) parts of the Graduate Record Examina- 
tion; the Miller Analogies Test (MAT); overall 
undergraduate GPA; undergraduate psychology GPA; 
and the number of courses in psychology taken prior 
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to the GRE. The criteria were graduation versus 
termination, as described above, and the percentage 
of “A” grades in graduate school. The latter criterion 
was based on the assumption that while any one 
course grade may be inaccurate, a student with a 
large proportion of A grades is likely to be 
superior academically to a student with a small 
proportion of A grades. Percentages were used to 
control for the fact that some students took fewer 
courses than others; this was especially true insofar 
as early terminators were concerned. A grades in 
three-credit courses were counted as one A, and 
A grades in six-credit courses were counted as 
two As, noncredit courses were excluded. 


Procedure 


Since raw scores on the GRE from different years 
do not always reflect the same percentile rank, and 
since some Ss took different forms of the MAT, 
raw scores on these predictors were not used. In- 
stead, the percentile rank corresponding to the 
raw score for each S was transformed to a standard 
score by means of the normal curve table, and this 
standard score was used as the score for S. Three 
additional predictors were then developed: an average 
GRE aptitude score, consisting of standard scores 
on GRE-V plus GRE-Q divided by two; the differ- 
ence between standard scores on GRE-P and average 
GRE aptitude; and the difference between standard 
scores on GRE-P and MAT. The latter two measures 
were intended to provide information regarding 
differences between level of ability and level of 
achievement. 

The mean and standard deviation of each vari- 
able, and the intercorrelations among all variables, 
were then computed. A few Ss were missing data on 
some of the variables, but in no case was the N 
for any statistic below 27. In addition, partial cor- 
relation coefficients were computed between GRE-P 
and each of the criteria, partialling out the number 
of psychology courses taken prior to the GRE. 


RESULTS 


The means and standard deviations of the 
predictors and criteria are shown in Table 1. 
The high means and small standard deviations 
for GRE-V and GRE-Q indicate that restric- 
tion of range was operating on these vari- 
ables; the sample is clearly one of high ability 
insofar as these two tests are concerned. Re- 
striction of range was less apparent on 
GRE-P, however, and did not appear to be 
operating on MAT. 

The correlations among the variables are 
shown in Table 2. For GRE-P, a validity of 
44 was obtained against the percentage of 
A’s criterion (V = 31, p < .05) and a validity 
of .66 was obtained against the graduation 
criterion (V = 31, p< .01). The difference 
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TABLE 1 


MEANS AND STANDARD DEVIATIONS OF 
PREDICTORS AND CRITERIA 











Variable N x S 

1 GRE-V 31 SO) co 
2 GRE-Q 31 1.07| .84 
3 GRE-P 31 0.88} .62 
4 MAT 29 0.39} 97 
5 Average GRE aptitude 31 1.32} .60 
6 Overall GPA® 30 2.93} .41 
7 Psychology GPA? 28 3.34 .49 
8 GRE-P minus average 

GRE aptitude 31 |-—0.44| .73 
9 GRE-P minus MAT 29 O52) ete 

10 Number of previous psy- 
chology courses 30 Tcd3anS. 19 


11 Percentage of “A”’ grades 
in graduate school 31 38.16 | 33.64 
12 Graduation? 31 .52 ai! 





® Maximum possible was 4.00. : 
b 1= graduated with a PhD degree; 0 = terminated without 
eceiving PhD degree. 


score involving GRE-P minus MAT showed a 
validity of .49 against the percentage of A’s 
criterion (V = 29, p< .01) and .43 against 
the graduation criterion (V = 29, p< .05). 
No other correlations between predictors and 
criteria were statistically significant. The cor- 
relation between the two criteria was .68 
(N = 31, p < .01) in the expected direction, 
with graduates receiving more A’s than ter- 
minators. 

The correlations of —.68 between MAT 
and undergraduate psychology graduate point 
average and —.43 between MAT and overall 
undergraduate grade-point average are dif- 
ficult to explain. These findings were suf- 
ficiently startling to warrant a reexamina- 
tion of the raw data, but inspection showed 
that the indicated negative trends did in fact 
exist. Since the correlations between MAT 
and GRE-V, and MAT and average GRE 
aptitude, were positive and significant in spite 
of the restriction of range of the GRE apti- 
tude measures, it would seem that if this 
finding is due to experimental error the fault 
must lie in the grade-point scores. No con- 
trol for the quality of undergraduate school 
was available, and it may be that the GPAs 
which were obtained at different schools were 
not comparable to one another and that no 
great importance should be attached to 
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TABLE 2 
INTERCORRELATIONS AND PREDICTIVE VALIDITIES 
Variable 1 | 2 | 3 | 4 5 | 6 7 8 | 9 | 10 | ti | 12 
1 GRE-V . 13 23 42" 728 —18 =—35 —398 —21 —-10 —0O1 17 
2 GRE-O 18 29 78" —22 —1i1 —49" -—17 17 17 22 
3 GRE-P ~ 07 28 —03 Ot 62" 50" 30 44* 6or* 
4 MAT — 47%** 43* 68" 32 83" 23 —28 —O07 
5 Average GRIC aptitude 27 30 584 25 06 12 26 
6 Overall GPA — OO 19 35 27 11 10 
7 Payehology GPA — 24 58% 05 28 04 
8 GREP minus average 
GRE aptitude — 648 20 28 35 
9 GREP minus MAT — 35 49% 43* 
10 Number of previous 
paychology courses — —02 32 
11 Percentage of A’ 
grades in graduate — 68t* 
uchool — 
12 Graduation» 
Note,—-Decimal points are omitted, 
® Significance was not determined for correlations between a composite score and individual variables appearing in that com- 
posite score since the variables in these correlations are not independent. 
D1 = graduated with a PhD degree; 0 = terminated without receiving a PhD degree. 
db < 05, 
me > <.01, 


these results. Shortcomings in grade-point 
measures have been noted by several writers 
(e.g., Chansky, 1964). In any case, the size 
of the negative coefficients is such that the 
lack of an explanation for these findings is 
disturbing. 

Partialling out the number of psychology 
courses taken prior to the GRE did not 
greatly affect the validities of GRE-P. The 
partial correlation with the percentage of 
A’s criterion was .47, and the partial cor- 
relation with the graduation criterion was 
Ogn 


Discussion 


It would be a serious error to interpret 
the findings concerning GRE-V and GRE-Q 
as evidence negative to the predictive useful- 
ness of these tests, because restriction of 
range undoubtedly prevented any significant 
validities from being obtained. In fact, this 
study (albeit inadvertently) is best viewed as 
the second stage of a multiple-cutoff procedure 
where Ss have passed the first screening based 
on ability as shown by the GRE aptitude 
tests. This is in fact ideal for purposes of 
testing the present hypothesis, since motivated 
preparation for GRE-P is likely to lead to a 
good score only if the student is capable. It 
should be stressed, however, that the findings 
of this study cannot be generalized to samples 
with average or low mean GRE aptitude 
scores. The apparent absence of restriction 


of range on the MAT indicates that selection 
was in general not based on this test, and the 
failure of the MAT to correlate with the 
criteria used in this study does therefore 
represent a substantive finding. 

The major hypothesis investigated by this 
study, that GRE-P is an unobtrusive measure 
of motivation, is supported by the significant 
validities obtained for this test against the 
two criteria, The result concerning the per- 
centage of A’s criterion replicates a finding 
of Stricker and Huber (1967). These investi- 
gators obtained a significant validity of .35 
for GRE-P against overall GPA in graduate 
school, using a sample of 37 students which 
did not include any terminators and may 
therefore have resulted in underestimation of 
the size of the validity coefficient. It is also 
necessary to show, however, that motivation 
rather than knowledge of psychology is the 
intervening variable. There are three findings 
in the present study that support this con- 
tention. First, validities of GRE-P were not 
appreciably affected when the number of 
psychology courses taken prior to the GRE 
was partialled out. Second, correlations be- 
tween GRE-P and both undergraduate grade- 
point measures were not significant. As dis- 
cussed previously, however, it is possible that 
the grade-point measures were deficient, so 
this finding should be interpreted with 
caution, Third, if GRE-P is primarily a 
measure of knowledge of psychology, it should 
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be correlated as or more highly with the per- 
centage of A’s in graduate school criterion 
than with the graduation criterion. Instead, 
the correlations with graduation was numer- 
ically higher, and the difference between the 
two validities approached statistical signifi- 
cance (p between .10 and .05, using the 
formula for nonindependent r’s_ [Edwards, 
1960, p. 85]). In addition, the significant 
validities for the difference score involving 
GRE-P and MAT suggest that factors other 
than ability are operative. In view of the fact 
that the sample was a preselected one and 
that other predictors were not correlated with 
the criteria used in this study, the validities 
involving GRE-P are particularly striking. 
The results of this study should be in- 
terpreted with caution, since the sample was 
small and since unequivocal identification of 
motivation as the true intervening variable 
cannot be claimed. Alternative explanations 
of the results, perhaps in terms of a social 
science aptitude tapped by GRE-P, are pos- 
sible. Also, the present study does not at- 
tempt to identify the underlying nature of the 
motivational variable. The work of Spence, 
Taylor, and others (e.g., Spence & Farber, 
1953; Spence & Taylor, 1951; Taylor, 1951) 
concerning manifest anxiety and generalized 
drive level suggests the hypothesis that varia- 
tions in manifest anxiety better explain varia- 
tions in the level of preparation for GRE-P. 
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While further research is necessary to evaluate 
these possibilities, the present results suggest 
that for a sample high in verbal and quanti- 
tative ability, GRE-P may well serve as an 
unobtrusive measure of motivation which 
will significantly improve the prediction of 
success in graduate school in psychology. 
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CORRELATES OF JOB SATISFACTION AND JOB 
DISSATISFACTION AMONG FEMALE 
CLERICAL WORKERS 
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Ohio University 


Degree of overall satisfaction, overall dissatisfaction, and overall satisfaction/ 
dissatisfaction were correlated with measures of satisfaction/dissatisfaction 
with several aspects of the work situation for 160 female clerical workers. 
Also, tabulations were made of responses to open-ended questions concerning 
reasons for positive and negative feelings about the company. The results of 
these analyses offered no support for the two-factor theory of job satisfaction, 
but were consistent with the traditional framework in which any variable can 


be both a “satisfier” and a “dissatisfier.” 


The two-factor theory of job satisfaction 
(Herzberg, 1966; Herzberg, Mausner, & 
Snyderman, 1959) proposes that one set of 
variables in the work situation leads to satis- 
faction but not dissatisfaction (motivator, 
satisfier, or intrinsic variables), while another 
set of variables in the work situation leads 
to dissatisfaction but not satisfaction (hy- 
gienic, dissatisfier, or extrinsic variables). 
This proposal is in contrast to the more tradi- 
tional model of job satisfaction in which any 
work-related variable may contribute to both 
satisfaction and dissatisfaction. 

Several recent studies (Ewen, Smith, 
Hulin, & Locke, 1966; Graen, 1966; Halpern, 
1966; Henrichs & Mischkind, 1967; Werni- 
mont, 1966) have reported data contradictory 
to the two-factor theory. However, in each 
of these studies satisfaction/dissatisfaction 
was measured on a single continuum. If, as 
the two-factor theory suggests, satisfaction 
and dissatisfaction are qualitatively different, 
they should be assessed separately (see 
Whitsett & Winslow, 1967, for a discussion 
of this point). 

In a more recent article (Hulin & Smith, 
1967), measures of satisfaction, dissatisfaction, 
and satisfaction/dissatisfaction were obtained 
from different randomly selected groups 
within the same company. These data are also 
contradictory to the two-factor theory for 
both male and female employees. 


1 Requests for reprints should be sent to Lawrence 
K. Waters, 45 Eden Place, Athens, Ohio. 
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The present paper reports the relationships 
of selected job-related variables to separate 
measures of overall satisfaction, dissatisfac- 
tion, and satisfaction/dissatisfaction obtained 
from the same group of female clerical 
employees. 


METHOD 


The respondents in this study were 160 non- 
supervisory female employees in one regional office 
of a national insurance company. The employees 
ranged in age from the late teens to early sixties 
and all were at least high school graduates. Immedi- 
ate supervisors of all employees utilized in the study 
were also females. 

A job attitude questionnaire was administered to 
small groups of employees by the author during a 
single working day. Respondents were assured that 
their individual responses would not be made known 
to the company. Names were requested but em- 
ployees were given the option of not responding to 
that item if they “felt uncomfortable” doing so. 
Approximately 17% did not give their names. Other 
information (job title, department, etc.) obtained 
on the questionnaire probably would have been 
sufficient to identify the respondent. 

The job attitude scales were presented in booklet 
form and consisted of separate overall satisfaction 
and dissatisfaction scales (always the first two 
scales, order randomized), an overall satisfaction / 
dissatisfaction scale, the five scales of the Job 
Description Index (JDI), and a list of 11 job factors 
(arranged in alphabetical order) to be rated on a 
satisfaction /dissatisfaction scale. Ratings of satis- 
faction/dissatisfaction (both overall and for specific 
job factors) were made on a 12-point anchored 
scale and the separate satisfaction and dissatisfaction 
ratings on 7-point scales which consisted of the 
appropriate 6 points of the 12-point satisfaction/ 
dissatisfaction scale plus a seventh alternative (not 
satisfied or not dissatisfied). Immediately after each 
of the separate satisfaction and dissatisfaction scales, 
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respondents were asked to indicate “one or two 
things that most influence your feelings in a posi- 
tive (or negative) way about your employment at 
the __________ company.” Responses were sorted 
independently by two judges into motivator, hy- 
gienic, wage, or unclassifiable categories. The judges 
agreed on 89.4% of the responses. Items on which 
the judges did not agree were labeled unclassifiable. 
If an employee gave only one job factor as a 
positive or negative influence, it was given a weight 
of two; if two job factors were mentioned, they 
each received a weight of one. If more than two 
factors were mentioned, only the first two were 
coded. About 84% of the respondents listed at least 
one factor as a positive influence, 76% listed at 
least one job factor as a negative influence, and 
8% did not respond to either question. 


RESULTS 


The mean scores for the overall satisfaction 
(S), dissatisfaction (D), and satisfaction /dis- 
satisfaction (S/D) scales were 3.58 (SD = 
Pegiet.0> (9) = 1,34), and 9.02 (SD= 
2.10), respectively. The correlation between 
S and D was —.61, and the two scales cor- 
related .78 and —.64 with S/D. To determine 
if the order in which the S and D scales were 
presented affected responses to these scales, 
S, D, and S/D mean ratings were computed 
separately for employees responding under 
each of the two presentation orders. The com- 
parisons of mean ratings between the two 
groups on corresponding scales yielded ?’s 
of less than 1.00 in all three comparisons. 

Correlations between the JDI scales and 
the satisfaction/dissatisfaction rating for the 
corresponding job factor from the list of job 
factors were computed to obtain estimates 
of the consistency of employee responses. 
These correlations were JDI Work—Work = 
70, JDI Pay-Salary = .73, JDI Promotion— 
Opportunity for Growth and Advancement = 
.63, JDI Supervision—Competent Supervision 
= .80, JDI Supervision—Considerate Super- 
vision = .79, and JDI Co-workers—Co-workers 
= .61. In all cases, the job factor correlated 
higher with the appropriate JDI scale than 
with any other job factor from the list, and 
the correlation between the JDI scale and the 
corresponding job factor was the largest cor- 
relation that the JDI scale had with any job 
factor. 

According to the two-factor model, mo- 
tivator variables should be related to degree 
of satisfaction but not degree of dissatisfac- 


389 


TABLE 1 


CORRELATIONS BETWEEN JDI SCALES AND 
OVERALL JOB SATISFACTION 











Overall 
WOE RES satisfaction- Overall Overall dis- 
dissatis- satisfaction | satisfaction 
faction 
Work 53** .62** —.45** 
Pay 29*% one —.25** 
Promotion Robie .20* —.19* 
Supervision 41** PSOne —.32** 
Co-workers One 200 —.13 
PipE<a05, 
wr p< 301, 


tion, and hygienic variables should be related 
to dissatisfaction but not satisfaction. No 
predictions concerning the relationship of 
motivator and hygienic factor with overall 
satisfaction/dissatisfaction can be made since 
the degree of overall satisfaction /dissatisfac- 
tion is presumed to be some unspecified com- 
posite of positive and negative influences. The 
traditional model, on the other hand, would 
predict that any job-related variable may cor- 
relate with both satisfaction and dissatisfac- 
tion. 

The correlations of the five JDI scales and 
overall S, D, and S/D are shown in Table 1. 
In general, the relationships of the JDI scales 
tended to be somewhat larger with S than D, 
but, except for the JDI Co-worker—D-scale 
coefficient, all of the correlations were sig- 
nificant and the order of the three more highly 
related JDI scales was the same for both S 
and D. One of the problems in interpreting 
these data in terms of the two-factor frame- 
work is the classification of job-related factors 
as either motivator or hygienic. Ewen et al. 
(1966) and Hulin and Smith (1967) classified 
JDI Work and Promotion as motivators and 
Pay as a hygienic. Supervision and Co-work- 
ers were considered ambiguous in terms of 
classification. Whitsett and Winslow (1967) 
objected to the use of Promotion as a mo- 
tivator and Pay as a hygienic (p. 399). 
Henrichs and Mischkind (1967) deleted pay 
as a hygienic variable. The only point of 
agreement seems to be that JDI Work is a 
motivator. As a motivator, it should be cor- 
related with S but not D according to the 
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two-factor model. For the female sample of 
the present study, JDI Work had the highest 
correlation of the scales with both S, D, and 
S/D, which is directly opposed to the mo- 
tivator-hygienic framework. With the excep- 
tion of Co-workers, the other JDI scales 
(however classified for the conditions of this 
study) were significantly related to S, D, 
and S/D. 

The relationships of ratings on different 
aspects of the job as obtained from the list of 
job factors to S, D, and S/D are given in 
Table 2. As was the case for the JDI scales, 
correlations with S tended to be somewhat 
larger than with D. With the exception of Co- 
workers, all of the correlations (for both mo- 
tivator and hygienic factors) were significant 
and the pattern of relationships with S and D 
were very similar (rho = .82). These data 
support the traditional theory and conflict 
with the two-factor model. 

Although neither the traditional nor the 
two-factor model make predictions concern- 
ing the relative potency of motivator and 
hygienic factors, several studies have reported 
generally greater saliency for motivator than 
hygienic variables (Ewen et al., 1966; 
Halpern, 1966; Hulin & Smith, 1967; 
Wernimont, 1966). The mean 7’s for both 
motivator and hygienic clusters were com- 
puted (using Fisher’s z’ transformation) to 
estimate the relative potency of the two 
classes of variables. For overall S, D, and S/D 
scales, motivators were generally more highly 


TABLE 2 


CORRELATION BETWEEN SATISFACTION/DISSATISFAC- 
TION (S/D) with DirrERENT ASPECTS OF 
THE JOB AND OVERALL SATISFACTION 

















Overall | Overall | Overall 
Job factors S/D Ss D 
Motivator 
Opportunity for Growth and 
Advancement .50** 3TH | — 36% 
Responsibility on the Job AS 41** | — 37% 
Recognition for Work Done ehh .40** | —,28%% 
Sense of Achievement 5 Bre ST | — 367% 
Work .50** 1k | — 45% 
Mean r 48 44 —.36 
Hygienic 
Competent Supervision 44** 44** | —,40%* 
Considerate Supervision .30** oot | —,31%** 
Company Policies & Practices .29%** ,29%* | —,16* 
Co-workers old 14 —.06 
Physical Working Conditions 24 .23** | —,24** 
Mean r 28 .29 —.24 
Salary TEE A3** | — 2.8% 
*p <.05 
KD < 01 
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TABLE 3 


CLASSIFICATION OF REASONS FOR POSITIVE 
AND NEGATIVE FEELINGS 





Feelings 





Motivator | Hygienic | Wages | Unclassified 
Positive 
(2 = 268) 16.8% 70.5% 5.6% 71% 
Negative 
15.7% 49.6% 20.2% 14.5% 


(2 = 242) 





related than the hygienic cluster. The data 
support previous results. 

The results from the open-ended questions 
concerning factors influencing feelings about 
the company in a positive or negative way are 
presented in Table 3. The percentage of 
weighted responses classified as motivators was 
almost the same for positive and negative 
influences. However, if wages are excluded 
from the hygienic category, more hygienic 
factors were mentioned as positive influences 
than negative influences. Considering wages 
as part of the hygienic classification yielded 
similar percentages for positive and negative 
influences (76.5 and 70.0, respectively). How- 
ever, for either definition of the hygienic set, 
these data are contradictory to the two-factor 
theory. 


Discussion 


The results of the correlational analysis 
and tabulation of responses to the open-ended 
questions were contradictory to the two-factor 
theory. In the correlational analysis the 
patterns of correlations for the various aspects 
of the job were very similar whether S, D, or 
S/D was used as the measure of overall at- 
titude about the job. Motivators performed 
as both satisfiers and dissatisfiers, and hy- 
gienic factors were related to both satisfaction 
and dissatisfaction. In addition, the sub- 
stantial correlation between overall satisfac- 
tion and overall dissatisfaction seems to offer 
little in support of the contention of the two- 
factor model that satisfaction and dissatis- 
faction are qualitatively different. Responses 
to the open-ended questions indicated that 
hygienic aspects were mentioned more fre- 
quently as reasons for positive feelings than 
negative feelings. 

These results, especially in conjunction 
with the Hulin and Smith (1967) study, in- 
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dicate that even when satisfaction and dis- 
satisfaction are assessed separately as sug- 
gested by the proponents of the two-factor 
model, no support is found for the theory for 
female employees. Both of these studies agree 
in conclusion with previous research (Ewen 
et al., 1966; Graen, 1966; Halpern, 1966; 
Henrichs & Mischkind, 1967; Wernimont, 
1966) using a single overall satisfaction /dis- 
satisfaction scale. The weight of these studies 
suggest that the results supporting the two- 
factor model are method-bound and that the 
model offers little to the understanding of 
worker attitudes. 
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CREATIVITY AND ACADEMIC MAJOR: 
BUSINESS VERSUS ENGLISH MAJORS* 


RUSSELL EISENMAN 2 
Temple University 


Business and English majors were compared on two measures of creativity, 
unusual uses for common objects and the Personal Opinion Survey, in order 
to explore further the interpretation of Maier and Hoffman (1961) regarding 
the inhibiting organizational effects on creativity. Chi-square tests revealed 
that English majors were superior to business majors on both creativity tests 
(p’s < .01) suggesting that a selective factor is operative: English attracts 
students high in creativity while business attracts students low in creativity. 
These results make the Maier and Hoffman emphasis on organizational effects 
somewhat dubious, since their results can be explained on the basis of this 
selective factor. Further comparison of the 48 Ss with 229 Ss previously tested 
indicated that business majors were significantly low on creativity while 


English majors were significantly high. 


Maier and Hoffman (1961) reported an in- 
vestigation of creativity in groups varying in 
the amount of experience and identification 
with existing organizations. Their expressed 
purpose was to consider the effect of experi- 
ence and identification with business and in- 
dustrial organizations on creativity. Employ- 
ing a role-playing case, the Change of Work 
Procedure problem (Maier, 1955), the authors 
studied creative problem solving in four 
groups: (a) an employed group, consisting of 
industrial foremen, airline managers, training 
directors, hospital managers, and nursing su- 
pervisors; (5) business administration stu- 
dents; (c) psychology of human relations 
students; and (d) introductory psychology 
students. The results clearly indicated that the 
greater the experience and identification with 
business or industry the lower the creativity. 
Maier and Hoffman (1961) concluded that 
“The results of this study provide suggestive 
empirical support for the proposition that the 
usual formal authority structure found in 
present-day organizations tends to inhibit the 
expression of the creative potential of their 
members [p. 279].” 

Since the production of integrative—crea- 
tive—solutions by groups with little or no 
identification and experience in business was 


1 This research was aided by a grant from the 
Trustees of Temple University. 

2 Requests for reprints should be sent to the au- 
thor, Department of Psychology, Temple University, 
Philadelphia, Pennsylvania 19122. 


three times as large as the proportion of in- 
tegrative solutions by Ss more identified with 
business or industrial organizations, there 
seems little room for doubt that Ss were dif- 
ferent in their performance on the assigned 
task. To the extent that the role-playing task 
effectively measures creativity, the business- 
industrial Ss performed markedly lower in 
creativity. However, in spite of the great dif- 
ference between business versus nonbusiness 
Ss there is one major conceptual problem, 
which leaves the basis of the results in ques- 
tion. Maier and Hoffman (1961) favor the 
interpretation that the formal authority sys- 
tem inhibits creativity by promoting an at- 
mosphere in which employees simply wish to 
do what the boss considers the right thing to 
do. While their results are consistent with 
such an explanation, there is yet another 
possible interpretation which needs to be 
considered. Could it not be that there is a 
selective factor operating, such that business 
and industry tend to attract Ss who are rela- 
tively lower in creativity than persons in 
other fields? If this selectivity were accepted 
as the explanation of the Maier and Hoffman 
results, it would be unnecessary to explain the 
findings in terms of length of association and 
identification with business or industry. In- 
stead, the major explanatory power would be 
carried by the concept of initial differences 
among the groups tested, such that business 
majors in college are relatively low in crea- 
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tivity even before their employment in busi- 
ness or industrial organizations. 

The major difference between the possible 
explanation advanced here and the one favored 
by Maier and Hoffman (1961) is that their 
explanation emphasizes the inhibiting effect 
of organizational authority, while the current 
hypothesis points out that organizational de- 
mands need not be called upon to explain 
their results. Instead, individual differences 
may be present in Ss in various fields. The 
specific hypothesis of the present study was 
that business students would score lower on 
two creativity tests relative to English majors. 
The two creativity tests employed were an 
unusual uses test in which S is required to 
give uses for common objects, with the higher 
scores going to Ss who produce original (sta- 
tistically infrequent) responses (Eisenman & 
Robinson, 1968); and the Personal Opinion 
Survey (Eisenman, 1968), a 30-item paper- 
and-pencil personality measure of creativity. 
English majors were chosen because it is of 
interest to see if their alleged interest in 
literary originality would be associated with 
a tendency to accept the unreal and the 
unusual. If such a tendency exists among 
English majors it should increase their crea- 
tivity, since creativity is frequently concep- 
tualized as an ability to ignore the obvious 
and consider unusual ways of doing things 
(Barron, 1963; Taylor, 1964). 


MeETHOD 
Subjects 


The Ss were 48 students at Temple University, in- 
cluding 20 English majors and 28 business majors. 
These Ss constituted all the business and English 
majors from two classes with a total of 65 students, 
yne a business course and one a course in English. 
The Ss were tested by an assistant during their class, 
with the permission of their teachers. 


Measures 


Unusual uses is a common creativity measure with 
lifferent investigators using a similar format, namely, 
9S are presented with the names of common objects 
ind asked to list all the uses they can think of for 
uch objects. Originality is defined as statistical in- 
requency. For example, in the present study any 
ise which appeared less than 5% of the time in the 
resent samples was deemed original. It is also pos- 
ible to obtain a fluency score simply by adding the 
1umber of valid responses, considering as invalid the 
yasic repetition of a theme. “Build a house” and 
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TABLE 1 


NuMBER OF SS ABOVE AND BELOW THE MepIAn (Mdn 
= 23.50) on TotaL NUMBER OF ORIGINAL 
UsrES FOR OBJECTS ON AN UNUSUAL 


Uses TEstT 
Classification Above Mdn Below Mdn 
Business major 9 19 
English major 15 5 





Note.—x? = 6.94, df = 1, p <.01. 


“build a dog house” would be considered only one 
response since they both employ the concept of 
building. However, correlations of fluency and 
originality are often in the .80’s-.90’s (Eisenman, 
1969; Madaus, 1967) so fluency was not evaluated 
in the present study. 

The Personal Opinion Survey (Eisenman, 1968; 
Eisenman & Robinson, 1967) is a 30-item, true-false, 
paper-and-pencil personality measure of creativity. 
The test is composed of five short-form tests of six 
items each taken from Child (1965). The subtests 
are tolerance for complexity, tolerance for ambiguity, 
scanning, independence of judgment, and regression 
in the service of the ego. The overall score is used, 
with a maximum possible score of 30. Higher scores 
indicate greater creativity. Odd-even reliability has 
been found to be .86 with the Spearman-Brown 
prophecy formula. 

Since the Personal Opinion Survey was machine 
scored, interscorer reliability is not an issue. With 
the unusual uses test, Ss had to list uses for five 
objects: bricks, spoons, paper clips, paper, and tooth- 
picks. Interscorer agreement ranged from 70-89% 
for the various objects. Reasonable validity can be 
claimed for both creativity measures; the reader is 
referred to Eisenman (1968, 1969). 

Although no specific time limit was mentioned, the 
implied time limit was the 50 min. of the class period. 


RESULTS 


The performance for business versus En- 
glish majors is shown in Table 1 for the un- 
usual uses test. It is apparent that on this 
creativity measure the English majors were 
significantly superior to the business students. 

Table 2 shows the relative performance of 


TABLE 2 


NUMBER OF SS ABOVE AND BELOW MEpIAN (Mdn 
= 17.58) ON PERSONAL OPINION SURVEY 


Classification Above Mdn Below Mdn 
Business major 8 20 
English major 16 4 


Note.— x? = 10.37, df = 1, <.01. 
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business and English majors on the second 
creativity measure, the Personal Opinion Sur- 
vey. Again, the English majors were sig- 
nificantly more creative. 

The possibiltiy remained that the results 
were due to either (a) the normal achievement 
of business majors on these measures but the 
markedly high performance of English majors, 
relative to other samples of Ss; or (0b) the 
normal achievement of English majors but the 
markedly inferior scores of business majors. 
As a partial attempt to answer this, the scores 
of business and English majors were com- 
pared with a normative sample of 229 Ss re- 
ported in Eisenman (1968). These Ss, com- 
posed mainly of college undergraduates, but 
including a smattering of professional men 
and laborers, obtained a mean score of 18.87. 
The business majors’ mean of 16.80 is sig- 
nificantly lower via a ¢ test (p< .05) while 
the English majors’ mean of 20.05 is sig- 
nificantly higher, again by a ¢ test with sig- 
nificance beyond the .05 level. Therefore, both 
(a) and (0) above can be ruled out and the 
conclusion to be drawn is that, relative to the 
229 Ss, the business majors are low in crea- 
tivity and the English majors are high, as 
measured by the Personal Opinion Survey. 


DISCUSSION 


The finding of low creativity in business 
majors even before they begin employment 
in business or industrial organizations sug- 
gests that for some reason business does not 
attract very creative students. The implica- 
tions of this selective factor for the interpreta- 
tion of Maier and Hoffman’s (1961) study 
would be to suggest that while it is possible 
that organizations inhibit creativity, with 
greater length of time in organizations as- 
sociated with lessened creative performance, 
the persons who go into these organizations 
tend, by and large, to be relatively low in 
creativity. Perhaps the interpretation empha- 
sized here complements the Maier and Hoff- 
man view. If relatively noncreative people are 
attracted to business as an academic major 
then they are not likely as leaders to foster 
creativity on the part of their subordinates. 

The higher creativity among English majors 
is also meaningful, and may serve as a basis 
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to infer the differences between business and 
English majors. Barron (1963) has empha- 
sized how rigid control, interest in the con- 
crete and safe, and a generally simple life 
style are inconsistent with creativity. His 
creative Ss prefer complexity both in aes- 
thetic preference and in their everyday life, 
in contrast to the noncreative Ss who achieve 
whatever prominence they attain by a more 
conventional adhering to the rules of society. 
Business tends to place a strong emphasis on 
the practical, everyday issue of economic 
competition, while English is a field with more 
emphasis on fantasy, the “inner life” of man, 
and less emphasis than business on concrete 
matters. Conceived in this way, the field of 
business adopts a stance that is more like that 
of Barron’s low creative Ss, while the field of 
English shows an interest in areas Barron 
finds correlated with high creativity. 

The results have further implications for 
student reaction to college climates. It may be 
predicted that business majors would more 
readily support the conservative organiza- 
tions, formal and informal, which oppose stu- 
dent attempts to engender change on the 
college campus. It might also be supposed that 
English majors would not be so likely to sup- 
port campus authority. Whether or not En- 
glish majors might be overrepresented among 
students pressing for change would depend 
additionally on their propensity for action 
versus their tendency to think rather than to 
do. An unpublished study of a campus boycott 
(Eisenman, Aserinsky, & Robinson, 1969) 
provides modest support for one of the above- 
mentioned predictions. A large sample of 
students yielded only five students who ac- 
tively violated a student boycott of the school 
cafeteria, thereby supporting the existing au- 
thority and opposing the peer group. All five 
were business majors. While such evidence is 
admittedly slim, it is entirely consistent with 
the results obtained in the present study. The 
present study, like that of Maier and Hoff- 
man (1961), suggests that business does not 
attract highly creative persons to any great 
extent. 
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An experiment was conducted to examine the relationship between overt 
forced-choice preferences and response latencies. A group of 24 males made 
66 paired comparisons of 12 stimuli, indicating their preferences on a key- 
board and basing their judgments on aesthetic considerations. The results in- 
dicated a systematic relationship between overt expressions of preference and 
the time required to indicate this choice on the keyboard. The most preferred 
stimuli yielded the shortest latencies, while the least preferred stimuli 
yielded the longest latencies. These results are in line with those demon- 
strated in previous studies employing considerably different procedures and 
conditions. However, the present procedure tended to yield a somewhat more 


linear relationship than did the previous studies. 


A series of recent studies has demonstrated 
a significant relationship between the judged 
affective value of a wide range of visual and 
auditory stimuli and the latency of the judg- 
mental response. In general, this relationship 
tends to take the form either of an essentially 
linear function (Bergum & Lehr, 1966; 
Bergum, Lehr, & Dooley, 1967) or of an in- 
verted J (Bergum & Lehr, 1967; Lehr, Ber- 
gum, & Standing, 1966), with positively af- 
fective stimuli yielding short latencies in all 
cases and negatively affective stimuli yielding 
long latencies. In these earlier studies, how- 
ever, the failure of negatively affective stimuli 
to yield consistently longer latencies than 
neutral stimuli could not be explained readily 
on the basis of the available data, although a 
number of hypotheses seemed possible. These 
included greater affective ambivalence, or un- 
certainty, toward the neutral stimuli, changes 
in the affective base line as new stimuli were 
experienced in the test situation, and the in- 
herent unreliability of single response evalua- 
tions. Unfortunately, none of these possi- 
bilities could be tested adequately with the 
procedures employed in the earlier studies. 

The purpose of the present experiment was 
thus to remedy this situation in part by 
employing a procedure where the basic reli- 
ability of the evaluation responses would be 
more assured, and where shifts in the affec- 


1 Requests for reprints should be sent to Bruce O. 
Bergum, Xerox Corporation, Research Laboratories, 
800 Phillips Road, Webster, New York 14580. 
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tive base line could be controlled. This pro- 
cedure, paired-comparisons, had the added 
virtue of providing more precise and reliable 
estimates of the affective distance between 
the stimuli in the sample. 


METHOD 
Subjects 


Twenty-four adult, college-educated males served 
as Ss in this experiment. All were members of the 
Xerox Research Laboratories staff. 


Apparatus 


Apparatus for the experiment included a Sawyers 
35mm. slide projector, a rear-projection screen, a 
keyboard and associated event recorder, and an 
audio tape recorder for presenting instructions. 


Materials and Conditions 


Six different types of stimuli were employed. 
These included a typewritten discussion of calendars, 
a table of salaries for scientists, a line graph, a bar 
chart, a map showing the distribution of scientists 
in the United States, and an exploded engineering 
drawing of an idler assembly. A colored and a black 
and white version were prepared of each of the six 
stimulus types and all possible pairings of the 12 
stimuli were photographed on 35mm. slides. The 
position of each stimulus was counterbalanced so 
that half of the time it appeared in the right half 
or lower position on the slide. The members of the 
pairs were labeled “1” or “2” on each slide to 
indicate their correspondence to the similarly labeled 
response keys. In all, a total of 66 such slides were 
prepared. : 

Each S$ viewed the 66 stimulus pairings in a 
different random order to control for possible practice 
effects and ordering effects among the stimuli. 


ForcEeD-CHOICE PREFERENCES AND RESPONSE LATENCY 


The Ss were seated in front of the rear-projection 
screen and instructed first to view the stimulus pair 
and then to indicate which member they most pre- 
ferred by depressing the appropriate response key. 
They were specifically instructed to employ aesthetic 
qualities as the basis for judgment to the extent that 
this was possible, and each S was allowed to proceed 
at his own pace until all 66 judgments had been 
made. Response measures included both overt 
preferences and response latencies. 


RESULTS 


The data were treated in three different 
ways. First, the total number of times each 
stimulus was chosen over all other stimuli was 
determined across all Ss, and these data con- 
verted to percent preferences. Second, the 
mean preference response latency was deter- 
mined across all Ss for all comparisons involv- 
ing each given stimulus. Finally, the 12 
stimuli were rank-ordered both in terms of 
the percent preference results and the latency 
results and a rank-order correlation performed 
between these two sets of ranks. 

Table 1 lists the 12 stimuli with their 
associated percent preferences, response la- 
tencies, and their associated rank orders. 
Casual inspection of the two sets of ranks 
suggests a high degree of relationship between 
the relative strength of overtly expressed 
preferences and the response latencies as- 
sociated with these judgments. The rank-order 
correlation between these two sets of data is 
.93, p< .01, indicating that both methods 
are, in fact, measuring very nearly the same 
thing. 


DISCUSSION 


The demonstration of an essentially linear 
relationship between overt preferences and 
choice-response latencies is in direct support 
of the results reported earlier in that the 
more preferred stimuli consistently yielded 
shorter latencies than did the less preferred 
stimuli. In the present case, however, the 
tendency for the neutral, or middle-ground, 
stimuli to yield the longest latencies was not 
observed, suggesting that the earlier results 
may well have been an artifact relating to the 
testing procedures employed. It seems likely 
that the longer latencies demonstrated with 
the earlier procedures resulted from the un- 
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TABLE 1 
PREFERENCES AND LATENCIES 

Stimulus % preferred | Rank | Latency | Rank 
Map-Color 89.8 1 1.03 1 
Drawing-Color 80.7 2 1S 3 
Chart-Color 73.5 3 1.05 2 
Graph-Color 59.8 4 1.26 + 
Map-Black Saf 5 1.31 5 
Chart-Black 52.3 6 1.41 8 
Table-Color 50.4 7 1.32 6 
Text-Color 38.3 8 1.42 9 
Graph-Black 35.2 9 1.47 10 
Drawing-Black 21.6 10 1.36 7 
Table-Black 20.5 11 1.67 12 
Text-Black 19.7 12 1.61 11 








certainty surrounding the choice of “neutral,” 
where the normal set was to respond in terms 
either of a positive or a negative vector, that 
is, the “neutral” response tended to represent 
a balance of positive and negative affect rather 
than a simple lack of affect. The paired- 
comparison procedure has the virtue that the 
judgments demanded of S are basically 
simpler (since the recall component is greatly 
reduced), and the repeated exposures to all 
of the stimuli tend to reduce any ambiguity 
relating to any given stimulus as well as to 
establish a relatively well-defined frame of 
reference in which to order the affective judg- 
ments. When the artifacts are thus removed, 
the result is the observed systematic, near- 
linear relationship between response latency 
and affective value. 

From the applications point of view, it is 
conceivable that choice-response latencies 
might prove to be both more reliable and 
valid than the actual overt choices themselves, 
since the latencies are unaffected by inad- 
vertent keyboard errors, and may be less 
prone to errors relating to Ss’ expectations of 
what they think Z may want them to prefer. 
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A VALIDATION STUDY OF POLYGRAPH 
EXAMINER JUDGMENTS * 


PHILIP J. BERSH 2 


Temple University 


The lie detection judgments of polygraph examiners in criminal investiga- 
tions conducted by the military services were validated against unanimous 
guilt-innocence decisions by a panel of four Judge Advocate General (JAG) 
attorneys. Since the study did not permit isolation of the role played by the 
polygraph record itself, the examiner’s judgment was considered the end 
product of his complete interrogation of a suspect. Each JAG attorney made 
an independent decision based upon perusal of case files from which all poly- 
graph references were deleted. Attorneys were instructed to eliminate files 
lacking sufficient evidence and to disregard legal technicalities. Level of agree- 
ment was 92.4%. Percentage of agreement decreased significantly to 74.6% 
when the criterion was a majority JAG panel decision. The study supports 
the use by the military services of polygraph examiner judgments as an aid 
in determining whether to continue or to terminate the investigation of a 


suspect. 


Criminal investigations conducted by the 
military services may include interrogations 
of suspects with the aid of the polygraph. In 
such cases the polygraph examiner’s judg- 
ment concerning the truth of the suspect’s 
replies to polygraph test questions often 
determines whether the investigation of the 
suspect should be continued or terminated. 
Yet a recent survey of lie detection by the 
polygraph method (Orlansky, 1964) has 
pointed to the almost complete lack of objec- 
tive evidence bearing upon its reliability and 
validity for applications of this kind. Lab- 
oratory experiments, including those in which 
crimes have been simulated (e.g., Davidson, 


1 The study was sponsored by the Department of 

Defense Research & Engineering Joint Working 
Group on Lie Detection, of which the author was a 
member. Data collection was carried out by Robert 
_ Brisentine, Office of the Provost Marshal General, 
_ United States Army, also a member of the Working 
Group. 

Special thanks are due to S. Rains Wallace, original 
chairman of the Working Group, and to Jesse 
Orlansky, current acting chairman of the Group, 
for their valuable comments and suggestions through- 
out the course of this investigation. Helpful sugges- 
tions were also received from other members of 
the group. 

2The study was performed while the author was 
employed by the United States Army Behavioral 
Science Research Laboratory, Washington, D. C. 
Requests for reprints should be sent to the author, 
Department of Psychology, Temple University, Phila- 
delphia, Pennsylvania 19122. 


1968; Kubis, 1962), have generally tended to 
support the effectiveness of the polygraph, 
and in particular that of the GSR indicator, 
for the detection of deception. However, it is 
often contended that the results of such ex- 
periments are inapplicable to live cases because 
of presumed radical differences in S’s mo- 
tivation to deceive and in his overall level of 
emotion. 

One major advantage of laboratory over 
real-life studies of the polygraph is that lab- 
oratory controls make it possible to insure 
that lie detection judgments are based solely 
on the polygraph record. In such a context 
as a criminal investigation, on the other hand, 
the contribution made by the polygraph itself 
to the detection of deception is extremely 
difficult to isolate. The polygraph examination 
proper is embedded in an interrogation that 
includes a pretest interview and sometimes a 
posttest interrogation. In addition, the poly- 
graph examiner has access to the case file and 
has ample opportunity to interact with the 
criminal investigator prior to the conduct of 
the examination. Each of these extrapolygraph 
forms of information is a potential source of 
cues which may have significant influence 
upon the examiner’s judgment. 

The present study was performed to assess 
the validity of lie detection judgments made 
by polygraph examiners in criminal investiga- 
tions conducted by the military services. No 
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TABLE 1 


NuMBER oF CASES IN FINAL VALIDATION SAMPLE 





Zone of Comparison General Question Test 





Deception | No deception| Deception |No deception 
indicated indicated indicated indicated 
37 52 35 33 








attempt was made to disentangle the influence 
of the polygraph examination and record from 
that of the extrapolygraph sources of informa- 
tion available to the examiner. This would 
require a far more elaborate and expensive 
study than the one reported here. Accordingly, 
the data of the study bear only upon the 
validity of the examiner’s judgement, not upon 
the validity of the polygraph method or of 
the polygraph record itself. In the final 
analysis it is this judgment, and not the 
record, which influences any further action 
that may stem from the interrogation. Valida- 
tion of that judgment is required to deter- 
mine whether its use for such purposes is 
warranted. 


METHOD 


Selection of Cases 


Cases were drawn at random from a pool of 
criminal investigations conducted by the three 
branches of service during the years 1963-66. Selec- 
tion of cases was subject to the following restric- 
tions: (a) Cases judged “indeterminate” by the poly- 
graph examiner were eliminated. (b) Half the cases 
represented the General Question Test (GQT) and 
the other half the Zone of Comparison (ZOC) type 
of polygraph examination.’ (c) Within each examina- 
tion type, there was an equal number of Deception 
Indicated (DI) and of No Deception Indicated 
(NDI) judgments by the polygraph examiner. 

Attrition due to the nature of the criterion used, 
as described below, affected the four resulting cate- 
gories differentially. Table 1 presents the final 
number of cases in each category. 


8 The GQT type of examination begins with a 
control question but thereafter presents control and 
relevant questions in random order. In the ZOC 
type of examination, each relevant question is inter- 
polated between a pair of control questions. The 
polygraph response to a relevant question is com- 
pared only with its surrounding control questions. 
In the case of the GQT type of examination, the 
polygraph response to a relevant question is com- 
pared with the level of response to contro] questions 
in general. 
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Validation Procedure 


Selection of an appropriate criterion against which 
to validate the polygraph examiner’s judgment poses 
considerable difficulty. Obviously, prima facie evi- 
dence of guilt or innocence of a suspect would be 
ideal. Unfortunately, such evidence is usually not 
available. Court-martial decisions constitute a reason- 
able possibility for criterion use. However, legal 
technicalities and sufficiency of evidence, factors un- 
related to the question of whether a suspect actually 
committed a crime, often play a key role in these 
decisions. Their influence upon the criterion might 
artifactually reduce the validity of the examiner’s 
judgment. Confessions also merit consideration as 
the criterion, but experience has demonstrated that 
some confessions are false. Equally important, they 
provide at best only a partial criterion, since their 
occurrence is interpreted as proof of guilt, but their 
failure to occur is not proof of innocence. Thus, 
lack of confession is neither confirming for NDI 
judgments, nor disconfirming for DI judgments. If 
cases selected for the validation sample are re- 
stricted to those in which a confession has occurred, 
then NDI judgments cannot be adequately validated. 
Such judgments are confirmed by proof or evidence 
of innocence; confessions only provide a basis for 
disconfirming them. The converse obviously holds 
for DI judgments. 

In view of the deficiencies of the above criteria, 
the following validation procedure was adopted for 
the present study. 

Polygraph records were removed from the case 
file, and all references to the polygraph were deleted. 
The case files were then submitted for review to a 
panel of four JAG attorneys. The use of a four- 
member panel was based on a preliminary study 
involving nine JAG attorneys representing all three 
service branches. This study demonstrated that 
unanimity among four attorneys meant unanimity 
among all nine attorneys. On the other hand, 
unanimity for a three-member panel did not assure 
unanimity for the full nine-member panel. Each 
member of a panel reviewed and judged the cases 
independently. The attorneys were given explicit 
instructions to disregard all legal technicalities and 
to judge each case solely on the evidence contained 
in the file. As a precaution, each attorney was 
first required to eliminate files containing, in his 
judgment, insufficient evidence to warrant a positive 
determination of guilt or innocence. Otherwise, cases 
judged DI by the polygraph examiner might be 
judged not guilty by the JAG panel merely because 
the case files contained little information. Only those 
cases which resulted in a unanimous decision by 
the JAG panel were retained in the validation 
sample. 

Thus, the validity of the polygraph examiner’s 
judgments was estimated by determining the level 
of their agreement with the unanimous decisions of 
a panel of four JAG attorneys, each of whom made 
an independent judgment of the guilt or innocence 
of suspects. Unanimous agreement among legal ex- 
perts experienced at sifting evidence and instructed 
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TABLE 2 


COMPARISON OF POLYGRAPH EXAMINER AND 
Unanimous JAG PANEL DECISIONS 


IN GQT CAsEs 
Polygraph examiner 
JAG panel a ti NY ti 
eception odeception| Total 

indicated indicated as 
Guilty 31 1 52 
Not guilty 4 32 36 
Total ve) 33 68 


both to disregard technicalities and to eliminate cases 
where sufficient information is lacking would seem 
to provide a criterion which approaches the ideal 
of direct proof of guilt or innocence. 

An initial group of 227 case files was submitted 
to a panel of four United States Army JAG at- 
torneys. When attrition reduced the number of usable 
cases below required levels, an additional group of 
96 case files was submitted to a panel of four United 
States Air Force JAG attorneys. Seventy-eight of 
the first 227 and 2 of the later 96 cases were elimi- 
nated by panel members for lack of sufficient in- 
formation in the case files The panel’s decision 
was unanimous in 91 of the 149 cases remaining 
from the first set and in 66 of the 94 cases remaining 
from the second set. 


RESULTS 


Tables 2 and 3 summarize the data for cases 
involving the GQT and ZOC types of exami- 
nation, respectively. In Table 4 corresponding 
cell entries for Tables 2 and 3 have been 
combined. 

The percentages of agreement between 
polygraph examiner and JAG panel are 92.6 
for GQT cases, 91.0 for ZOC cases, and 92.4 
for all cases combined. Agreement of 90.3% 
was reached on cases judged DI by the poly- 
graph examiner and 94.1% on cases judged 
NDI. Chi-square tests of independence were 
performed on the data in the three tables. 
Chi-square for Table 2 is 49.7, for Table 3 
62.6, and for Table 4 112.4. These chi- 
squares are all significant at well beyond the 
.001 level. Chi-square (df =1, p < .001) is 


4The discrepancy between the Army and Air 
Force JAG Panel rejection rate was to a consider- 
able extent due to a single member of the Army 
Panel. It would have been less justifiable to eliminate 
this panel member post facto than to tolerate his 
unusually stringent criterion for sufficiency of 
evidence. 
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TABLE 3 
COMPARISON OF POLYGRAPH EXAMINER AND 
Unanimous JAG PANEL DEcIsIoNs 
IN ZOC CasEs 
Polygraph examiner 
JAG panel 
Deception | Nodeception| ota] 
indicated indicated 
Guilty 34 4 38 
Not guilty 3 48 Du 
Total 37 52 89 


10.8. Phi coefficients were also computed for 
the three tables. For the GQT cases, Phi = 
86 (Phimax = .97); for the ZOC cases, Phi 
= .84 (Phinax = .97); and for all cases com- 
Dined;.Phi =7850(Piings 297). 

Unanimity on the part of the JAG panel 
is, of course, a stringent criterion. Generally, 
the initial decision of conviction by a court- 
martial is the product of a two-thirds majority 
of the court members, and, upon subsequent 
review, decision is by a simple majority of the 
appellate tribunal. It seems worthwhile, there- 
fore, to consider also data for cases in which 
the JAG panel decision was a majority one. 
(This, of course, is equivalent to a decision 
by three-fourths of the panel.) There were 
59 such cases. 

In view of the small number of cases in- 
volved, only the combined data for all cases 
are presented in Table 5. 

The percentage of agreement between poly- 
graph examiner and a majority of the JAG 
panel is 74.6%. Chi-square for Table 5 is 
14.7 (p< .001), and the corresponding phi 
coefficient is .49 (Phi max = .87). 


TABLE 4 


COMPARISON OF POLYGRAPH EXAMINER AND 
Unanimous JAG Panet DEcIsIons 
IN ALL CASES 





Polygraph examiner 


JAG panel 
Deception | Nodeception| Tota] 
indicated indicated 
Guilty 65 S 70 
Not guilty 7 80 87 
Total 72 85 157 
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TABLE 5 


COMPARISON OF POLYGRAPH EXAMINER AND 
Majority JAG PANEL DECISIONS 
IN ALL CASES 





Puitip J. 


BrERsH 


TABLE 7 


COMPARISON OF POLYGRAPH EXAMINER AND 
Majority or Unanrmous JAG PANEL 
DEcISsIONS IN ALL CASES 








Polygraph examiner 





JAG panel ; 
Deception | Nodeception| ota] 
indicated indicated 
Guilty 24 10 34 
Not guilty 5 20 25 
Total 29 30 59 


Polygraph examiner 








JAG panel 
Deception | Nodeception| Tota} 
indicated indicated 
Guilty 89 15 104 
Not guilty 12 100 112 
Total 101 115 216 





Table 6 compares levels of agreement for 
cases in which the JAG panel decision was 
unanimous with those in which its decision 
was a majority one. 

Finally, the majority decision cases were 
combined with the unanimous decision cases 
to provide level of agreement data for all 
cases involving at least a majority decision by 
the JAG panel. The data for all cases are 
presented in Table 7. 

The percentage of agreement is 87.5%, 
chi-square for Table 7 is 121.6 (p< .001), 
and the corresponding phi coefficient is .75 
(Phi max = .97). Cases judged DI by the 
polygraph examiner yielded 88.1% agree- 
ment and those judged NDI resulted in 86.9% 
agreement. 


DiIscussION 


The data show clearly that the polygraph 
examiner’s judgment is predictive of the JAG 
panel decision. This is particularly true in 
cases where the evidence is sufficiently com- 
pelling to produce a unanimous decision by 
the panel. Level of agreement between the 
examiner and the panel decreases significantly 
for cases in which the panel’s decision is 


TABLE 6 


COMPARISON OF LEVELS OF AGREEMENT 
BETWEEN POLYGRAPH EXAMINER 
AND JAG PANEL 





JAG panel Agree | Disagree} Total 
Unanimous decisions 145 12 157 
Majority decisions 44 15 59 

Total 189 27 216 





Note.— x? = 12.3,p <.001. 


simply a majority one. A parsimonious inter- 
pretation of these relationships between ex- 
aminer and panel judgments would appear to 
implicate the case file as the only common 
source of information. In unanimous decision 
cases the file evidence may be more convinc- 
ing for the examiner, as it is by definition for 
the panel, than in majority decision cases. It 
is very unlikely, however, that the examiner’s 
judgment is determined solely by the file, or 
even by the file in combination with informa- 
tion provided by the criminal investigator. 
The polygraph examination is ordinarily not 
given to suspects whose guilt or innocence 
has already been substantially or finally estab- 
lished. Only where real doubt exists about the 
guilt status of the suspect is he permitted 
or asked to volunteer for an examination. The 
fact is that the case file at the time of the 
polygraph examination was less complete, and 
often far less complete, than when it was de- 
livered to the JAG panel. The examiner’s 
judgment, then, presumably reflects also the 
influence of other sources of information, such 
as the pretest interview, the polygraph ex- 
amination proper, and the polygraph charts. 
As noted earlier, the design of the present 
study makes it impossible to determine the 
relative contribution to the examiner’s judg- 
ment and to the validity of that judgment 
made by each of the possible sources of in- 
formation available to him. 

In particular, no conclusions are drawn 
about the validity of the polygraph record 
itself or about its contribution to the validity 
of the examiner’s judgment. It seems reason- 
able to assume, however, that the validity of 
that judgment sets an upper limit for the 
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validity of the record. Since agreement of the 
examiner’s judgment with the criterion (i.e., 
unanimous JAG panel decision) is high, a 
substantial validity for the polygraph record 
is certainly possible. A study which investi- 
gates the joint and independent influence of 
the information sources available to the ex- 
aminer (with emphasis on the role of the 
polygraph charts) has been planned, but is 
awaiting approval by the Department of 
Defense, as well as the removal of practical 
obstacles to efficient data collection. 

Caution dictates that any conclusions 
drawn should be limited to criminal investiga- 
tions carried out by the military services. They 
should not be applied to personnel screening 
by the services or by other governmental 
agencies. Nor should they be generalized to 
use of the polygraph by individuals or agencies 
outside the federal government. The latter 
restriction is especially justified by the marked 
differences in favor of the military services 
with respect to quality control over the poly- 
graph examiner’s training and performance. 
Of course, wherever civilian agencies employ 
standards approaching those of the military 
services, the findings of this study will pre- 
sumably apply. 

A final comment is in order about the 
nature of the validation carried out in this 
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study. As far as the polygraph examiner is 
concerned, he has made judgments about the 
truth of the suspect’s responses. On the other 
hand, the criterion itself is concerned with 
guilt or innocence determination. Accordingly, 
the labels “DI” and “NDI” by which the 
examiner categorizes suspects appear to be 
reasonably valid indicators of the guilt or 
innocence of the suspect, particularly in cases 
where the criterion decision is unanimous. 

As noted in the introduction, the polygraph 
interrogation is used within the military ser- 
vices to determine whether to continue or 
terminate the investigation of particular sus- 
pects in criminal cases where the evidence 
on hand is inconclusive. Whatever the basis 
may be for the examiner’s judgment, its use 
for such a purpose is strongly supported by 
the present study. 
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MMPI SCORES AS RELATED TO AGE, EDUCATION, 
AND INTELLIGENCE AMONG MALE 
JOB APPLICANTS 


FRED J. THUMIN 1 


University of Missouri at St. Louis 


This study was designed to ascertain the relation of age, education, and 
intelligence to the 13 basic scales of the MMPI among normal male Ss in a 
competitive employment situation. Correlational analysis revealed that (a) 
age was negatively related to Scales F, 7, and 8, (b) with age and intelligence 
partialed out, education was positively related to Scales ZL, K, and 9, but 
negatively to Scale 0, and (c) with age and education partialed out, intelligence 
was negatively related to Z and positively to 5. All multiple correlations using 
age, education, and intelligence as predictor variables and MMPI scales as 
criterion variables resulted in significant correlations except those involving 


Scales 1 and 2. 


The relationship between MMPI responses 
and the age, education, and intelligence of Ss 
has been studied by a number of investigators, 
but as yet the data are too limited to permit 
either cogent conclusions or firm generaliza- 
tions regarding the interactions of these 
variables. While certain of the findings are 
reasonably consistent from one report to an- 
other, others are seemingly inconsistent or, 
in some cases, contradictory. At least in part, 
the discrepancies may be attributed to dif- 
ferences in the nature of Ss, but failure to 
control or to partial out the influence of 
confounding variables is undoubtedly an ad- 
ditional contributing factor. 

Nonetheless, among relatively diverse popu- 
lations, the age variable was found to be 
positively related to Scales 1 and 2 (Aaronson, 
1958; Brozek, 1955; Calden & Hokanson, 
1959), to Scale O (Brozek, 1955; Calden 
et al., 1959), and to Scale 5 among male Ss 
(Applezweig, 1953; Brozek, 1955). Similarly, 
negative relationships between age and Scales 
8 and 9 have been reported by Applezweig 
(1953), Brozek (1955), and Gynther and 
Shimkunas (1966). 

Regarding education, perhaps the most con- 
sistent finding is that among male Ss, as 
educational level increases, a corresponding 
elevation appears on Scale 5 (Applezweig, 

1 Requests for reprints should be sent to the 
author, School of Business Administration, Uni- 


versity of Missouri, 8001 Natural Bridge Road, 
St. Louis, Missouri 63121. 
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1953; Brehm, 1954; Gough, 1954; Gynther 
et al., 1966). The relation between intelligence 
and MMPI performance is anything but 
clear, although Stanton (1956), Gynther et al. 
(1966), and others have found that, with 
male Ss, higher levels of intelligence are ac- 
companied by higher scores on Scale 5. Al- 
though the foregoing relationships are among 
those most frequently and consistently re- 
ported, it should be pointed out that excep- 
tions could be cited in every case, revealing 
the still very amorphous nature of the situa- 
tion. 

The objective of the present study was to 
ascertain, among normal individuals in a 
competitive employment setting, the relation- 
ship between the basic scales of the MMPI 
and the age, education, and intelligence of Ss. 
Unlike most earlier studies in this area, a 
special attempt was made to separate or 
isolate the effects of the three personal vari- 
ables under consideration. 


METHOD 


The sample consisted of 236 male Caucasians 
between the ages of 19 and 56 who had been 
referred to a psychological consulting firm by 36 
different business organizations for testing and in- 
terviewing between September, 1966, and June, 
1968. Although the great majority of the men were 
applying for positions with the firms which referred 
them for evaluation, some were current employees 
being considered for promotion or advancement. The 
Ss were evaluated for a variety of positions, including 
managerial, administrative, accounting, engineering, 
sales, and foreman. From an experimental standpoint, 
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TABLE 1 
MMPI Scars AS RELATED TO AGE, EDUCATION, AND INTELLIGENCE 
Column A— Column B— 
Correlation coefficient Partial r Column C— 
ra Multipler | M@ | SP? 
Age | Education | Intelligence | Age | Education | Intelligence 
& 02 09 —15* 0 19** —22** 24** 4.3 2.4 
Ff —18** —04 09 —17* —i1i 10 Dee 2 1.9 
K —08 20** 11 —04 7 01 21" 17.9 4.6 
| (Hs) | —08 02 0 —08 02 —02 08 11.5 2.7 
2 (D) 01 —12 —10 —02 —09 —04 13 16.6 Bel 
3 (Hy) | —O1 14* 13* 02 09 07 16* 20.4 3.4 
1 (Pd) | —08 01 12 —06 —06 12 14* 22.0 SE7, 
5 (Mf) 08 12 ioe 13 05 1 oe 23.0 4.0 
) (Pa) | —02 18** Dis 03 12) 10 21am 8.8 2.1 
] (Pi) | —18** 03 08 —17* —03 05 13*% 24.0 3.3 
3 (Sc) | —20** 10 12 —18** 03 05 22x oon 4.2 
) (Ma) | —12 Lie 16* —08 16* 05 24** 19.8 3.8 
) (Sz) 09 —26** —11 05 —24** 03 De 17.4 6.8 
*p <.05 
> < 01 


t is significant that Ss were similar and homogeneous 
rom the standpoint of motivation and commonness 
9f objective; that is, all were tested in the same 
ealistic employment situation, and presumably all 
were strongly motivated to perform as well as 
0ssible on the tests. 

As part of the evaluation process, all Ss were 
idministered the Otis Self-Administering Test of 
Mental Ability (Higher Examination, Form A) and 
he MMPI. Only the 13 basic scales of the MMPI 
were used for purposes of the present investigation. 
[The Ss were also required to complete an employee 
nventory from which information pertaining to 
heir chronological age and years of formal education 
vas obtained. The mean age, education, and Otis 
‘aw score for the sample were 34.5 yr., 13.9 yr., 
ind 52.6 points, respectively, with corresponding SDs 
of 7.9, 2.5, and 10.9. An Otis raw score of 52.6 
sonverts to a percentile score of approximately 82, 
ising general adult population norms. 

The data were analyzed first by obtaining simple 
yroduct-moment correlations between the MMPI 
cores and the age, education, and intelligence of 
3s. In addition, a multiple regression analysis was 
»erformed for each MMPI scale using age, educa- 
ion, and intelligence as predictor variables. This 
malysis resulted in partial correlations which re- 
realed the separate and independent effects of the 
wredictor variables upon the scales, as well as 
aultiple correlations which provided information as 
o the combined effects of the predictors. 


RESULTS AND DISCUSSION 


Table 1 shows the mean K-corrected raw 
core for each MMPI scale, the correspond- 


ing SDs, and the correlation coefficients be- 
tween the scales and the three personal vari- 
ables (age, education, and intelligence). The 
simple correlation coefficients appear in 
Column A, the partial coefficients in Column 
B, and the multiple coefficients (using age, 
education, and intelligence as predictor vari- 
ables) in Column C, Although the data do 
do not appear in Table 1, age correlated 
—.19 with education, and —.21 with intel- 
ligence, while the correlation between educa- 
tion and intelligence was .48. All three values 
are significant at or beyond the .01 level of 
confidence. The negative relationship between 
age and education can probably be explained 
in terms of the greater inclination for younger 
individuals to carry their formal education 
further than did the members of the previous 
generation. 

It is noted in Column A that age correlated 
significantly with 3 of the 13 basic scales (viz., 
with F, 7, and 8), and all were in the nega- 
tive direction. Moreover, these relationships 
were virtually identical in Columns A and B, 
indicating that it made little difference 
whether or not the variables of education and 
intelligence were partialed out. The inverse 
relations between age and Scales F and 8 
are consistent with the recent findings of 
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Gynther et al. (1966) and may well reflect 
an individual’s inclination to become in- 
creasingly practical and realistic in his think- 
ing as he grows older, at least through middle 
age. 

Perhaps even more interesting, however, is 
the fact that the significant relations between 
age and Scales 1, 2, 5, 9, and 0, reported by 
several other investigators, did not occur. 
There is no suggestion in the present data 
that individuals between early adulthood and 
middle age become increasingly concerned 
with bodily ailments, increasingly despondent, 
lose zest and enthusiasm, or grow socially 
introverted. On the other hand, the present 
sample contained relatively few Ss beyond 
middle age, so that many of the traits 
typically associated with advancing age 
simply may have gone undetected. Also, it 
can not be overlooked that this sample con- 
sisted of job applicants and employees being 
considered for promotion, as opposed to col- 
lege students and hospital populations as were 
used in most previous studies. And there is 
evidence that elevations on a given scale may 
have different meanings or implications for 
different groups (Thumin, 1965). For ex- 
ample, a high Mf score among advertising 
men may reflect artistic and aesthetic in- 
terests rather than homosexual tendencies, as 
might be the case with certain other groups 
of Ss. 

It is prehaps not surprising to find that, as 
education increased, there was a correspond- 
ing decrement on Scale 0, suggesting that 
formal education facilitates or enhances the 
capacity and/or desire for social interaction. 
It seems equally feasible that a positive rela- 
tion existed between education and Scale 9 
for those individuals with substantial amounts 
of enthusiasm, optimism, and energy might 
well be expected to carry their formal educa- 
tion further than would persons with some- 
what fewer endowments along these lines. 

In several instances, the interaction effects 
of education and intelligence are apparent. 
For example, in Column A both Hy and Pa 
are significantly related to education and 
intelligence, whereas the ‘“uncontaminated”’ 
values in Column B drop below the level re- 
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quired for significance. Moreover, the LZ scale 
is negatively correlated with intelligence in 
both Columns A and B, but positively (and 
significantly) correlated with education only 
in Column B where the influence of intelli- 
gence was partialed out. Apparently the 
brighter people with more education are able 
to “see through” and avoid the unrealistically 
favorable items comprising the JL scale, 
whereas better educated people, minus the in- 
sight afforded by higher intelligence, are con- 
siderably more apt to subscribe to these kinds 
of items in their efforts to present themselves 
in the most possible favorable light to a 
prospective employer. Although the present 
study failed to reveal a significant relation 
between education and Scale 5 (as reported 
by a number of other investigators), the ex- 
pected positive correlation between _intelli- 
gence and Scale 5 did occur. As a matter of 
fact, with age and education partialed out, 
intelligence correlated significantly with only 
two scales—5 and L. 

Finally, it is noteworthy that 11 of the 13 
multiple correlations were statistically sig- 
nificant, and most of them at or beyond the 
01 level. Thus, when age, education, and 
intelligence were used in combination as pre- 
dictor variables and weighted so as to maxi- 
mize their relation to the criterion variables 
(i.e., to the MMPI scales), nearly all were 
found to be of predictive value. The three 
highest correlations were with Scales L (.24), 
9 (.24), and 0 (.27). 
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THE SVIB FOR WOMEN AND DEMOGRAPHIC VARIABLES 
IN THE PREDICTION OF 
OCCUPATIONAL TENURE 
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In an ex post facto attempt to predict women’s “occupational tenure,” 198 
occupational therapists and 255 physical therapists were studied. Hight 
demographic variables and selected scales of the women’s Strong Vocational 
Interest Blank (SVIB) were incorporated in two types of prediction equations. 
A double cross-validation design was used to develop and test four multiple 
regression and four reciprocal averages equations. Five of the demographic 
variables correlated significantly with tenure, but none of the SVIB scales 
proved to be stable predictors. The method of reciprocal averages prediction 
yielded equations which proved more stable across samples and suffered less 
shrinkage than those produced by the multiple regression technique. 


The prediction of “occupational tenure” 
for women, defined here as the percentage of 
time worked in a field after completion of 
training for that field, has become an im- 
portant manpower research problem in recent 
years. A reliable method of predicting which 
women will have the greatest occupational 
tenure over their working lives, and who will 
therefore provide the greatest “return on 
educational investment,” would have consider- 
able practical value for educators. While the 
SVIB for Women has not been used to pre- 
dict tenure directly, several studies have 
related SVIB scores to tenure-related criteria. 

A follow-up study of two groups of high 
scorers on the SVIB Social Worker and Lab- 
oratory Technician scales (Harmon, 1968) 
found that the scales predicted “usual occupa- 
tion” reasonably well for those Ss who 
claimed one; however, the SVIB was not suc- 
cessful in predicting whether or not women 
would report a “usual career” 10 to 14 yr. 
after entrance to college. Another study (Har- 
mon, 1967) utilized a behavioral criterion 
of tenure, hypothesizing a relationship be- 
tween women’s working patterns and their 
SVIB Housewife and Own occupational scale 


1 Requests for reprints should be sent to Gary T. 
Athelstan, Assistant Director of Research, Center for 
Research and Education in the Health Occupations, 
The American Rehabilitation Foundation, 1800 
Chicago Avenue, Minneapolis, Minnesota 55404. 


scores. No differences were found on the 
Housewife scale between groups having dif- 
ferent work patterns, and only one group dif- 
fered in the expected direction from others 
on Own scale. Precollege SVIB scores were 
found to have some success in predicting col- 
lege major and occupation in a seven-year 
follow-up study by Nolting (1967) of 316 
University of Minnesota female graduates. 
Despite the lack of comparability of tenure 
indexes among these three studies, it appears 
that the SVIB is, at best, only moderately 
related to occupational tenure for women. 

Several important health occupations whose 
members are mostly women, including oc- 
cupational and physical therapy, are now 
suffering from critical personnel shortages. 
Since only about one-half of the women 
trained in these fields are working in them 
(Flint, 1968), it appears that retention of 
trained personnel may do more to help al- 
leviate the manpower shortage than recruit- 
ment or any of the other traditional solutions. 
Therefore, health manpower investigators are 
attempting to identify factors which are pre- 
dictive of tenure, and which can be used to 
select potential high-tenure people for training 
and employment. This study compares the 
relative efficiency of selected SVIB scales and 
demographic variables in the ex post facto 
prediction of tenure. 
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MeEtTHOD 
Subjects 


Mailing lists were compiled from the membership 
rosters of professional organizations for occupa- 
tional therapists (OTs) and physical therapists 
(PTs), and from state certification records dating 
back to 1960. In 1966, each S identified in this 
manner was mailed a Career Patterns Questionnaire 
and the women’s SVIB (Form TW400-R, 1966) as 
part of a survey of health manpower in Minnesota 
(Flint, 1968). Three follow-ups were made over 
a period of about 6 wk., yielding returns from 198 
OTs and 255 PTs. The respondents constitute nearly 
80% of the qualified (fully trained) females in these 
fields in the state of Minnesota. 


Tenure Index 


An index of tenure was constructed from the 
questionnaire data available on each S. The index 
was operationally defined as the total number of 
years of professional work experience divided by 
the total number of years since graduation from 
the professional program. The values obtained were 
then converted to percentages so that the scale was 
continuous from 0 to 100%. This index permits 
women of different ages to be compared on the 
same measure, and is a convenient way of represent- 
ing “return on educational investment.” 


Demographic Predictors 


Eight demographic variables were selected from 
the questionnaires as likely predictors of tenure for 
both OT and PT groups. These were marital status; 
year of birth; amount of own income; amount of 
spouse’s income; when the decision to become an 
OT or PT was made, ranging from “before entering 
high school” to “after completing college”; number 
of children living at home; presence of children in 
the 0-5 age group; and presence of children in the 
6-12 age group. 


Selected SVIB Scales 


A preliminary step-wise regression analysis of the 
relationship between tenure and all scales of the 
women’s SVIB was performed to select the scales 
most likely to predict the criterion. Four SVIB scales 
yielded significant correlations (# < .05) with tenure 
for the PTs: the Lawyer, Music Teacher, and Speech 
Pathologist scales, and Experimentai Scale No. 2.2 
Only one predictor emerged for the OT group: 
SVIB Experimental Scaie No. 2. 

In order to increase the number of scales to be 
used in the analysis, the Occupational Therapist and 
College Physical Education Teacher scales were added 
to the possible predictors for the OT group on a 
purely intuitive basis. The Math-Science Teacher 
scale was similarly included for PTs, and the House- 
wife and Social Worker scales, which were used by 


2This scale is for research purposes and is not 
available for general use. 
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Harmon (1967, 1968), were selected as possible 
predictors for both groups. 


Procedure 


Two methods of prediction were used in the study: 
linear multiple regression and reciprocal averages 
prediction (Weiss, 1963). The investigators were 
also interested in comparing the relative effective- 
ness of the two prediction techniques. It was felt 
that reciprocal averages prediction (RAP) might be 
a better method of prediction since it does not 
assume linearity and categorical data may be used. 
Also, some evidence (Weiss & Dawis, 1968) exists 
which suggests that RAP developmental prediction 
equations are more stable in cross-validation than 
are multiple regression equations. 

A double cross-validation design (Mosier, 1951) 
was used in which the OT and PT groups were each 
randomly split into two approximately equal-sized 
groups. In this design, eight prediction equations 
were developed, one multiple regression and one 
RAP equation for each of the four development 
groups. Each of the equations incorporated all of the 
selected demographic and SVIB variables. Within 
the OT and PT categories, each development group 
served as the other’s cross-validation group. 


RESULTS 


Forty-nine percent of the OTs and 69% of 
the PTs were employed within their respec- 
tive fields at the time of the survey. The OTs 
had worked as OTs an average of 59% of the 
time since completing their training. The 
average occupational tenure reported by PTs 
was 68%. 

Zero-order correlations between the various 
predictors and tenure for the OT and PT 
groups are reported in Tables 1 and 2. In- 
spection of both tables shows five demographic 
variables to be the best predictors of tenure 
when used singly: number of children; pres- 
ence of children in the 0-5 age group; pres- 
ence of children in the 6-12 age group; own’ 
income; and spouse’s income. Number of 
children, which showed the highest correlation 
with the criterion, was found to be inversely 
related to tenure, as was presence of children 
in both the 0-5 and 6-12 age groups. 

Tenure was positively related to own in- 
come and inversely related to spouse’s income. 
Marital status correlated highly with tenure 
in both PT groups, but in neither of the OT 
groups. Among PTs, marriage was associated 
with low tenure. A significant correlation was 
found between age and the criterion for all 
groups, although the relationship for PTs was 
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TABLE 1 


ZERO-ORDER CORRELATIONS BETWEEN PREDICTORS 
AND TENURE FOR OCCUPATIONAL THERAPISTS 
(Total V = 198) 




















=e Y r 
Variable 7 (Group 1) | (Group 2) 
Marital status —.15 —.19 

Never married 20.2 

Married 79.8 
Number of children —.93 —.62 

None 34.3 

One or two 37.4 

Three or more 28.3 
Own income .60 39 

Less than $5000/year | 44.5 

$5000 or more/year | 55.5 
Spouse’s income —.33 —.58 

None (includes un- 

married) 27.8 

Less than $10,000/ 

year 44.9 

$10,000 or more/year | 27.3 

When decide on OT 
career 01 —.17 

High school or before | 36.5 

During college 55.0 

After college 8.5 
Have children aged 0-5 | 46.0 —.42 — Al 
Have children aged 6-12] 66.7 —.51 —.50 

: r r 
Vanable M (Group 1) | (Group 2) 
Age 33.07 —.28 —.46 
SVIB OT 46.14 .00 .08 
SVIB Housewife 34.47 —.18 —.06 
SVIB College Phy. Ed. 

Teacher PANS .04 .16 
SVIB Social Worker 34.71 at, —.07 
SVIB Experimental 

Scale No. 2 45.31 .22 .07 
Percentage of time 

worked 58.84 
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relation with either OT or PT tenure was 
“When decide on OT-PT career?” None of 
the SVIB scales showed significant correla- 
tions with the criterion. 

The RAP and multiple regression analyses 
produced results similar to the zero-order 
correlations. Predictors which appeared in 
only one of the two development groups 
within either of the OT or PT categories may 
be regarded as unstable. With the exception 


TABLE 2 


ZERO-ORDER CORRELATIONS BETWEEN PREDICTORS 
AND TENURE FOR PHYSICAL THERAPISTS 
(Total V = 255) 











curvilinear, and was not initially reflected in 
the correlation coefficient. Among the OTs age 
was negatively related to tenure: Younger 
women had a higher level of tenure than did 
older women. To some extent, the higher 
tenure of the younger women is an artifact of 
the index; it is easier for the young women 
to demonstrate high tenure, simply because 
they have had less time to participate in home- 
making and other activities which interfere 
with occupational tenure. 

The only one of these demographic vari- 
ables not having a significant zero-order cor- 











; r r 
Variable %o (Group 1) | (Group 2) 
Marital status —.47 —.40 
Never married oul 
Married 66.3 
Number of children —.65 —.68 
None 5255 
One or two 31.4 
Three or more 16.1 
Own income ao oO 
Less than $5000/year | 34.9 
$5000 or more/year | 65.1 
Spouse’s income —.54 —.50 
None (includes un- 
married) 38.4 
Less than $10,000/ 
year 39.6 
$10,000 or more/year | 22.0 
When decide on PT 
career —.06 01 
High school or before | 31.0 
During college 43.5 
After college 25:5 
Have children aged 0-5 | 65.5 —.52 —.38 
Have children aged 6-12| 77.6 —.46 —.60 
: r r 
Variable (Group 1) | (Group 2) 
Age 35.04 —.03 —.16 
SVIB Housewife 34.44 —.15 —.13 
SVIB Social Worker 31.39 —.25 —.15 
SVIB Experimental 
Scale No. 2 31.39 .16 12 
SVIB Math-Science 
Teacher 33.16 .00 18 
SVIB Lawyer 23.27 —.03 —,.18 
SVIB Music Teacher | 23.04 —.04 —.08 
SVIB Speech Patholo- 
gist 28.97 07 —.15 
Percentage of time 
worked 67.84 
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of “When decide on OT-PT career,” all 
demographic variables were stable and ap- 
peared as such in both development groups of 
the OT and PT categories. None of the SVIB 
scales proved to be stable predictors. Several 
of the SVIB scales emerged as significant in 
one or another of the developmental equa- 
tions, but all washed out upon cross-valida- 
tion. 

Two of the stable predictors, number of 
children and age, demonstrated a significant 
(p< .05) curvilinear relationship with the 
criterion. Number of children is nonlinear for 
all groups except OT Group 2. Increasing age 
is associated with decreasing tenure for PTs 
to age 37, at which point tenure begins to 
increase. This change probably reflects reentry 
to the labor market among women whose 
youngest child has entered school; however, 
the mean criterion score for the oldest PTs is 
not as high as that of the youngest PTs. 

Table 3 illustrates the level of prediction 
attainable when the RAP and multiple regres- 
sion techniques are used to combine optimally 
all SVIB scales and demographic variables. 
While the developmental multiple R’s are 
consistently higher than the reciprocal aver- 
ages correlation coefficients, the reverse is true 
of the cross-validated coefficients. This result 
is consistent with the findings of other studies, 
and suggests that RAP equations are more 
stable across samples and suffer less shrinkage. 


DIscuUssION 


The results of this study suggest that 
demographic variables are better predictors 
of occupational tenure for these groups than 
are selected scales of the women’s SVIB. A 
preliminary step-wise regression analysis sug- 
gested that several SVIB scales were sig- 
nificantly related to occupational tenure. 
However, none of the zero-order correlations 
between the SVIB scales and tenure were sig- 
nificant, and the scales added very little to 
the multiple regression or reciprocal averages 
prediction equations. 

The data also encourage the use of recip- 
rocal averages prediction in place of the usual 
linear multiple regression technique, especially 
when the nature of the predictor-criterion 
relationship is unknown and categorical pre- 
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TABLE 3 


DEVELOPMENT AND CROSS-VALIDATED CORRELATIONS 
BETWEEN ALL PREDICTORS AND TENURE, USING 
MULTIPLE REGRESSION AND RECIPROCAL 
AVERAGES PREDICTION 














; ‘ Reciprocal averages 
Multiple regression prediction 
Group 
Develop- | Cross- | Develop-| Cross- 
ment validation | ment | validation 
Oe 16 Boil .69 61 
Only 74 .60 .70 .65 
Palen 16 .60 HS .68 
Ealae2, 74 .60 12 .67 








dictors are used. The RAP equations de- 
veloped in this study identified and in- 
corporated curvilinear relationships and also 
proved more stable than the multiple regres- 
sion equations. 

Although the demographic variables used in 
this study were found to be much better pre- 
dictors of tenure than selected SVIB scales, 
it should be noted that they were used as 
ex post facto predictors: They related to 
tenure differences, but were not accessible in 
this study for longitudinal prediction. The 
one demographic factor that might have been 
useful in prediction, the time when occupa- 
tional choice was made, turned out to be non- 
significant. On the other hand, the failure of 
the SVIB scales to relate to occupational 
tenure provides further evidence that the 
vocational behavior of women cannot be ad- 
equately accounted for by means which are 
usually successful for men. 

One of the major difficulties in predicting 
the working patterns of women is accounting 
for the variance that results from their enter- 
ing and leaving the labor market. It is likely 
that this behavior will seldom be reflected in 
scores on any of the standard occupational 
scales of the SVIB, since women leave an 
occupation more often for family and related 
reasons than for lack of interest in their jobs. 

However, the negative results obtained to 
date in using the women’s SVIB to predict 
occupational tenure should not discourage the 
use of this instrument in such studies. Some 
of the factors which differentiate women 
having high occupational tenure from those 
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having low tenure are almost certainly tapped 
by the SVIB, although not by the existing 
occupational scales. A very recent study 
(Schissel, 1968) holds promise for scales de- 
veloped specifically to predict “career orienta- 
tion.” The most useful of several scales de- 
veloped by Schissel differentiated women who 
had been employed at least five consecutive 
years from these who were not employed, with 
41% distributional overlap between the 
groups. 

Unfortunately, as Schissel’s Career Orienta- 
tion Scale for women utilizes the men’s SVIB, 
it was not possible to evaluate the scale for 
the groups of OTs and PTs studied here. 
However, it is likely that a scale developed 
for the women’s SVIB will work at least as 
well as Schissel’s, possibly even better. 
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FEEDBACK AND RESPONSE MODE IN PERFORMING 


A BAYESIAN DECISION TASK * 


DAVID W. MARTIN ? ann CHARLES F. GETTYS 3 
Ohio State University 


In a complex decision-making situation Ss received data generated by one of 
three hypotheses according to specified conditional probabilities. The Ss inferred 
which hypothesis had generated the data or estimated the probability of each 
hypothesis given the data. Feedback was given after each trial either as the 
hypothesis which generated the data or the probability that each hypothesis 
generated the data calculated by Bayes’ theorem. The two response conditions 
and two feedback conditions were combined factorially with a group of 16 Ss 
making 200 responses in each condition. The percentage of trials when Ss 
chose the most probable hypothesis was significantly higher for the groups 
responding with a single hypothesis than for the probability response groups, 
and higher for the Bayesian probability feedback groups than for the groups 
receiving no feedback. The Bayesian probability feedback group also gave 
probability responses which were much closer to the optimal probabilities 


than did the no-feedback group. 


Considerable research has suggested that in 
some situations man is suboptimal in process- 
ing probabilistic information and arriving at a 
decision (see Peterson & Beach, 1967, for a 
survey of the literature). This lack of opti- 
mality has usually shown man to be conserva- 
tive in probability estimation; the decision- 
maker’s estimates are usually less extreme 
than those of the optimal model, Bayes’ 
theorem (Phillips & Edwards, 1966). A num- 
ber of studies have succeeded in determining 
factors which influence this suboptimal be- 
havior. The present experiment asks if man’s 
inferences are influenced by experience within 
a decision-making environment in which 
several modes of response are required and in 
which feedback is provided concerning the 
appropriateness of these responses. 

In a decision-making situation, a decision- 
maker infers which state has occurred among 
a number of possible states of the world. After 


1The research reported in this paper was carried 
out at the Human Performance Center and was 
sponsored by the Aerospace Medical Research Lab- 
oratories, Aerospace Medical Division, Air Force 
Systems Command, Wright-Patterson Air Force 
Base, Ohio, under Contract No. AF 33(615)-2248 
with the Ohio State University Research Foundation. 

2 Request for reprints should be sent to David W. 
Martin, Human Performance Center, Ohio State 
University, 404-B West 17th Avenue, Columbus, 
Ohio 43210. 

3 Now at the Engineering Psychology Laboratory, 
University of Michigan. 


collecting relevant information about the 
world, he may be required to respond either 
by inferring which state has occurred or by 
an estimate of the probability that the world 
is in each state. The former case will be re- 
ferred to as nominal response mode and the 
latter by probability response mode. A de- 
cision-maker might improve his performance 
with the aid of feedback. Again, at least two 
possibilities exist for feedback mode. Either 
he may be informed about which state the 
world is in or, if posterior probabilities can 
be determined from Bayes’ theorem, he may 
be informed of the actual probabilities of each 
state of the world. Thus, feedback mode can 
either be nominal or Bayesian probability. 

In many situations either nominal or prob- 
ability responses may be used. A course of 
action usually is determined by the selection _ 
of only one event as that which is most likely 
to have occurred. However, perhaps this 
nominal type of response could be made more 
accurately if the decision-maker were required 
to respond probabilistically and his largest 
probability were taken to be his nominal 
choice. Both nominal and probability informa- 
tion regarding the optimality of a decision are 
also sometimes available. Particularly in a 
situation where events occur quite frequently, 
not only is information concerning what ac- 
tually happened available, but through the 
use of records and calculations, a Bayesian 
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TABLE 1 

Four P(D;,;/H;:) CONTINGENCY TABLES 

Dia Ay Hy Hy 
Dy1 E22 51 eiiS 
Do pol roll aS 
Ds,1 Pail .18 fhe) 
Dy,,2 38 ea 44 
Dee 19 57 2 
Dee 43 wy) .29 
Dy,3 8) eS AL 
Do,3 ails} 61 .20 
Dee 14 16 39 
Din, 37 49 31 
Dah 52 30 25 
Ds,4 mle Pout 44 











probability solution might also be determined. 
Where frequentistic data are not available 
such as in a high-level command-control sys- 
tem where major war is a low frequency event, 
estimates from experienced experts might be 
used as probability feedback in place of an 
actual Bayesian calculation. It would be 
helpful to know whether the additional prob- 
ability feedback could be processed by de- 
cision-makers to aid them in their decision 
or to help them improve their performance. 

The present experiment was an attempt to 
determine which types of response and feed- 
back modes produce responses which most 
closely approximate Bayesian responses in a 
given probabilistic environment, and whether 
this performance improves with practice. 


METHOD 
Subjects 


Sixty-four male university students were paid 
$1.25/hr for voluntary participation. 


Design 


Two response modes, nominal and probability, 
and two feedback modes, nominal and Bayesian 
probability, were combined factorially to yield four 
conditions. Sixteen Ss were assigned randomly to 
each of the four response-feedback conditions. The 
Ss under each condition were trained and tested in 
small groups (6-10 Ss). 


Procedure 


An abstract decision-making situation was used 
in which the world could be in one of three equally 
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likely states called hypotheses, H, (see Table 1). 
In each of four contingency tables, called data 
classes, three possible states could occur with prob- 
abilities contingent upon which H; had been chosen 
from the three equally likely H: (e.g., if H: were 
chosen, State 1 would occur in Data Class 1 with 
a probability of .22, State 2 with 51, and State 3 
with .27). To generate the data for a trial the fol- 
lowing procedure was used: First, an Hs; was 
randomly selected, and then, in each of the four 
contingency tables, a data state was chosen for each 
data class according to the conditional probabilities 
associated with the Hy; selected. These probabilities 
may be designated P(Djx|H:), the probability of a 
particular data state (j) for each data class (k) 
given the occurrence of an hypothesis (2). 

On each trial Ss were informed which data states 
had been generated within each data class. Based 
upon this information, the P(Djx|Hs) contingency 
tables, and the knowledge that the Hs were equally 
likely prior to any information about data states, 
Ss were required to make one of two types of 
responses. Under nominal response mode, Ss made a 
subjective selection of which H; generated the data 
states, and under probability response mode, Ss 
estimated how likely each H; was, given the data 
states which had occurred, ¥(H:|D). After each S 
had made his subjective response, feedback was 
given. Under the nominal feedback mode, Ss were 
told which H; had generated the data states, and 
under the Bayesian probability feedback mode, Ss 
were given the optimal probability estimate com- 
puted from Bayes’ theorem: 


P(H.) P(D|Hs) 


Sake in = PH) POH) 


where P(H;) is the probability of H; prior to in- 
formation about data states, P(D|H:) is computed 
from the four contingency tables, P(H:|D) is the 
posterior probability of H, given the data, and D 
is the occurrence of a particular sample of data. 
As conditional independence was assumed, P(D|H,) 
= P(Dy:|H:) P(Dj2|H4) . P(Dys|Hi). The Ss 
were not, however, permitted to do any written 
calculations, and were not told how to calculate 
Bayes’ theorem. Their estimates, therefore, are as- 
sumed to be subjective opinions. 

All Ss received a 1-hr. instruction period con- 
sisting of a frequentistic definition of probability 
and a description of the task situation in terms of 
an objects-and-urns paradigm (H,’s were urns, di- 
mensions of objects were data classes, levels of each 
dimension were data states). Five sample trials 
with feedback were then given to familiarize Ss 
with the type of decisions to be made and the 
mechanics of the task. 

Data and feedback were presented by means of 
closed-circuit TV. A display first appeared informing 
S which data state had occurred in each of the four 
data classes. For example, S would see a sheet of 
paper on his monitor which might list Data Class 1 
as being in State 2, Data Class 2 in State 1, Data 
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Class 3 in State 1, and Data Class 4 in State 1. 
He could then refer to his four contingency tables 
which were always available and pick out the four 
appropriate P(Dj,.|H;) rows. Each S$ then made 
either an estimate as to which Hy generated the 
sample of data states or else a ¥(H.|D) for each Hi, 
depending upon the response condition in which he 
was operating. These responses were written on 
the face of an IBM card. The Ss in the probability 
response conditions made three-digit probability 
responses having values between .998 and .001. In 
the above example an S making a nominal response 
might write down H;, and an S making a prob- 
ability response might write down .600, .250, and 
.150, respectively, for each H, (the actual prob- 
abilities in this example being .769, .108, .123). After 
all Ss had deposited their cards in a box in front 
of them, feedback for that trial appeared on their 
monitors. The feedback informed them either which 
Hi, had generated the data on that trial or the 
P(H;|D) for each H; on that trial, depending on 
the feedback condition in which they were operat- 
ing. A new trial then appeared in the same manner 
for 200 trials. This required four or five 2-hr. ses- 
sions depending upon the self-pacing of the group 
of Ss. 


RESULTS AND DISCUSSION 
Bayesian Choice Analysis 


Since all groups did not make the same 
type of response, for a comparison between 
all four groups the probability responses in 
two of the groups were transformed to nominal 
responses using the following assumption: Ss 
would have chosen the hypothesis receiving 
their highest probability had this been re- 
quired of them. Thus, the dependent measure 
used to compare the four groups was the 
percentage of times that Ss explicitly or 
implicitly chose the hypothesis which was 
most likely from the Bayesian calculations. 

When the percentage Bayesian choice data 
were plotted for the four groups over blocks 
of 50 trials, significant block differences were 
found. To determine whether this block effect 
indicated that learning occurred, a linear com- 
ponent trend analysis of the main effect of 
blocks was computed. No significant linear 
component of trend was found, F (1/240)< 
1.00, and since there was no interaction of 
blocks with either response mode or feedback 
mode, no learning would seem to have oc- 
curred in any of the groups. The block dif- 
ferences were apparently due to random vari- 
ation of problem difficulty within blocks 
rather than learning. For this reason, the 
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Fic. 1. The percentage of trials on which Ss 
chose the H; with the highest Bayesian probability 
as a function of response mode and feedback mode. 


percentages were averaged over all trials, and 
one point was used to describe the data for 
each group. Figure 1 indicates that both re- 
sponse mode and feedback mode had an effect 
on performance. The mean difference between 
the two groups making nominal responses and 
those making probability responses was 3.6%, 
and the difference between the two groups 
receiving nominal feedback and those receiving 
probability feedback was 2.9%. An analysis 
of variance of the percentage Bayesian choices 
indicated that main effects due to response 
mode, F(1/240)= 19.09, p < .01, and feed- 
back mode, F (1/240)= 10.31, p < .01, were» 
significant; no significant interactions were 
found. The fact that these significant effects 
were small is also indicated by the proportion 
of variance accounted for. Estimated w? for 
response mode was .06 and for feedback mode 
was .03. 

While the differences between groups are 
significant, the differences might be con- 
sidered small in terms of the overall level of 
performance since Ss endorsed the same H; 
as Bayes at least 80% of the time. The im- 
portance of this difference would, of course, 
depend upon the cost of an incorrect choice. 
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An evaluation of S’s performance in terms 
of this discrete-choice measure would thus 
indicate that superior performance is achieved 
when Ss make nominal responses and prob- 
ability feedback is given. 

Originally it was thought that the use of 
the probability response mode would cause Ss 
to exhibit a more exacting type of inference 
and thereby improve their inferred nominal 
response performance. The results indicate 
that the opposite was true. The task of stating 
probabilities reduced the percentage of oc- 
casions when S’s highest probability esti- 
mate was under the hypothesis receiving the 
highest Bayesian estimate. A possible ex- 
planation for this unexpected result would be 
that probability responses require different 
information-processing behavior than nominal 
responses require. When nominal responses are 
made, for example, perhaps only the few 
hypotheses judged to be most likely need be 
considered since S’s task is simply to choose 
the most likely hypothesis. If S’s response 
is a probability, he should be concerned with 
the likelihood of all the hypotheses. 

The finding concerning response mode 
would thus indicate that if the output re- 
quired of a decision-maker is simply a choice 
of the most likely hypothesis, his performance 
is better with a nominal response than it is 
with a probability response which involves 
the processing of more information. This 
would seem to be true regardless of the feed- 
back given. 

The Ss receiving probability feedback were 
superior to those receiving nominal feedback 
in terms of the choice measure. The prob- 
ability feedback may have given Ss a better 
idea of how probabilities ought to be aggre- 
gated or interpreted. A second possibility is 
that the probability feedback groups actually 
received more valid information regarding 
optimal performance than the nominal feed- 
back groups. The Ss receiving probability 
feedback always knew which hypothesis was 
most likely, but those receiving nominal feed- 
back were only informed which hypothesis 
generated the data. The generating hypothesis 
does not always have the largest Bayesian 
probability, and for this reason is not always 
the optimal nominal response. The hypothesis 
that generated the data will be different from 
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the hypothesis having the largest posterior 
probability with a probability of 1 — P(H*| 
D), where H* is the hypothesis which gen- 
erated D. Thus, the nominal feedback groups 
did not always receive information regarding 
optimal response, while the probability feed- 
back groups did. To determine which of these 
explanations is true in a future experiment, a 
nominal feedback condition could be added in 
which the hypothesis with the highest 
Bayesian probability is given as feedback 
rather than the generating hypothesis. 


D Score Analysis 


By using a second dependent measure, the 
estimates of the two probability response 
groups could be compared to the probabilities 
calculated from Bayes’ theorem (Schum, 
Goldstein, Howell, & Southard, 1967). This 
measure, D, is based upon the difference be- 
tween the log of the likelihood ratio inferred 
from S’s response and the likelihood ratio 
from the calculation of Bayes’ theorem; 


_ |, wH*|D) _, _PCHAID) 
D=le 7 _ (|p) °° {pga 


The difference measure (D) thus is an ex- 
pression of the correspondence between S’s 
response and the optimal response. When 
these responses are equal S’s inferred likeli- 
hood ratio and the Bayesian likelihood ratio 
are the same and the difference between them 
is 0} D=0. A positive D indicates that $ 
responded with a probability under H* larger 
than that calculated from Bayes’ theorem, and 
a negative D indicates that S’s probability 
was smaller (conservatism). 

To compare the probability response 
groups, an analysis of variance was performed 
on the D scores. A nonsignificant block effect, 
F(3/120)= 1.08, p > .25, indicated that per- 
formance was essentially flat across trials with 
no indication of learning. There were, how- 
ever, significant main effects due to feedback 
condition, F(1/120)= 40.19, p< .01. When 
using D score, which is a much more sensi- 
tive measure than percentage correct choices, 
the proportion of variance accounted for by 
feedback mode was .24. The mean of the D 
scores for nominal feedback was —.25 and for 
probability feedback, —.11. Since perfectly 
Bayesian behavior would yield a D of 0, both 
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groups of Ss could be considered to have 
responded with probabilities less extreme than 
those indicated by Bayes’ theorem, but the 
group receiving probability feedback re- 
sponded significantly closer to optimality than 
did the nominal response group. 

Since the diagnosticity of an item or a 
sample of items has been found to be. a major 
determinant of performance in past experi- 
ments, the responses in the present experiment 
were grouped into three diagnosticity cate- 
gories. Diagnosticity was not a major variable 
in the experiment, so it was not included in 
a factorial manner. The grouping into diag- 
nosticity levels was accomplished by ordering 
the 200 trials by the size of the Bayesian 
P(H|D) under the hypothesis which gen- 
erated the sample, and then separating these 
trials into three categories. The P(H|D) 
under the generating hypothesis in each diag- 
nosticity category were as follows: low, .058— 
434; medium, .439-.770; and high, .781- 
.954. The number of trials per block rep- 
resenting each diagnosticity level was not 
equal. The mean D scores for the two groups 
responding with probabilities are shown in 
Figure 2. An inspection of Figure 2 shows 
no indication of learning; yet there are obvi- 
ous differences in behavior for the two groups. 
The data from both groups would seem to 
support past research indicating less conserva- 
tive and even excessive estimates when item 
diagnosticity is low, and more conservative 
estimates when diagnosticity is high (Peterson 
& Miller, 1965). However, the performance 
for the group receiving probability feedback 
was, at all three diagnosticity levels, much 
closer to optimal than the corresponding per- 
formance for the nominal feedback group. 
In fact, the differences in divergence from 
optimality between the two groups was larger 
than the single means across diagnosticity 
levels previously reported would indicate, 
since suboptimal behavior above zero tended 
to make suboptimal behavior below zero ap- 
pear more optimal. 

No evidence of learning was found when 
the data were analyzed in 50-trial blocks, 
even though there were other significant main 
effects. An analysis of the first 50-trial block 
was undertaken to determine if learning took 
place in the first 10 to 50 trials (rapid learn- 
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Fic. 2. Mean D score for probability-response 
groups as a function of feedback mode (nominal 
and probability) and sample diagnosticity (high, 
medium, and low). 


ing). Again, no clear-cut trend effects were 
found. Both dependent measures, however, 
were so sensitive to between-trial diag- 
nosticity differences that any rapid learning 
effects might have been buried in this vari- 
ability. Rapid learning might also have oc- 
curred during the five preexperimental prac- 
tice trials. 

As an attempt to determine whether rapid 
learning effects could be found, the prob- 
ability-response, probability-feedback condi- 
tion was replicated using 16 different Ss. 
However, only 20 trials were used and prob- 
lems were randomized within these trials. 
This group performed at about the same D 
score level as the previous group, but again 
no short-term trend effects were found. 


Conclusions 


Thus, in the probability response groups, 
whereas the type of feedback had small but 
significant effects in terms of hypothesis 
choice, it had large effects in terms of the 
appropriateness of the size of the probability 
estimates. Figure 2 indicated that the groups 
receiving only nominal feedback behaved in 
a much less optimal manner than the prob- 
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ability feedback group. The D scores for the 
nominal feedback group were at least twice 
the size of those for the group receiving 
probabilities for feedback. 

These results concerning feedback would 
indicate that even in situations in which a 
decision-maker is simply required to choose 
a most likely hypothesis, his performance can 
be enhanced by presenting probability feed- 
back. A trade-off undoubtedly exists here, 
however. The improvement in choice perform- 
ance is small, even though significant, and 
the cost of determining a Bayesian probability 
vector (assuming that it can be determined) 
is likely to be high in most situations. How- 
ever, in a situation where a probability re- 
sponse is required, probability feedback seems 
to be very important if S is to make estimates 
that are at all close to Bayesian estimates. 
This is true particularly when evidence is 
highly diagnostic. In some situations, deter- 
mining Bayesian probabilities is an impos- 
sibility, but in those situations where it is 
possible, particularly when probability re- 
sponses are required, Bayesian probability 
feedback should be given. 
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Perhaps the only appropriate means of 
implementing probability feedback in many 
situations where posterior probabilities are 
unavailable would be by training decision- 
makers in a situation where veridical posterior 
probabilities are available and transferring 
them to the situation where they are not. 
The present study does not, of course, answer 
questions concerning transfer, but it would 
seem that improved performance under prob- 
ability-response and probability-feedback con- 
ditions might transfer to conditions without 
feedback. 
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INTERFERENCE BETWEEN CONCURRENT TASKS 
OF DRIVING AND TELEPHONING* 
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Twenty-four men were given the task of judging whether to drive through 
gaps which might be larger or smaller than the car. They were also given a 
telephoning task of checking the accuracy of short sentences. Interference 
between the concurrently performed tasks was investigated. Telephoning mainly 
impaired judgments of ‘impossible’ gaps (p < .01). The control skills em- 
ployed in steering through ‘possible’ gaps were not reliably degraded, although 
speed of driving was reduced (p < .01). Driving increased errors (p< .01) 
and prolonged response times (p < .005) on the sentence-checking task. It is 
concluded that telephoning has a minimal effect on the more automatized 
driving skills, but that perception and decision-making may be critically 
impaired by switching between visual and auditory inputs. 


In the next decade a substantial increase is 
expected in the number of radiophones fitted 
to road vehicles. The user population will 
probably include many more car drivers on 
business trips who may be handling far more 
complex messages than those transmitted by 
present professional drivers employed in the 
police, fire, ambulance, and taxi services. The 
question arises as to whether this concurrent 
activity will impair driving skills sufficiently 
to increase the risk of accident on the road. 
To the authors’ knowledge there is no direct 
evidence from research or from accident sta- 
tistics which answers this question conclu- 
sively, although there is limited evidence that 
attention to auditory stimuli has little effect 
on the control skills employed in driving 
(Brown, 1965, 1966, 1967). This paucity of 
reliable information has led to the anomalous 
situation, in the United Kingdom at least, in 
which a driver is permitted to telephone but 
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risks prosecution if he performs some more 
automatized task, such as shaving with an 
electric razor. 

It is clear that there are two main sources 
of interference between driving and telephon- 
ing. Having to use a hand microphone and 
having to manipulate push buttons to make 
or take a call will be inconvenient and may 
impair steering, gear changing, or other con- 
trol skills. This is a problem which may be 
solved by engineering advances and is not the 
concern of the present paper. A more impor- 
tant and lasting problem arises from the hy- 
pothesis that man can be considered to act 
as a single communication channel of limited 
capacity. The prediction from this hypothesis 
is that the driver will often be able to tele- 
phone only by switching attention between 
the informational demands of the two tasks. 
Telephoning could thus interfere with driving 
by disrupting visual scanning, since visual and 
auditory information would have to be trans- 
mitted successively. It could also interfere by 
overloading short-term memory and impair- 
ing judgment of relative velocity, which de- 
pends upon integration of successive samples 
of visual information. Less important, inter- 
mittent sampling of the telephone message 
could result in partial or complete failure of 
communication, depending upon the redun- 
dancy in the message. 

The object of the present experiment was 
to investigate this effect of divided attention 
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on judgments of clearance, control skills, and 
the checking of auditory messages. 


MertHop 
Driving Task 


Four different aspects of driving skill were mea- 
sured: 

(1) The ability to judge whether a gap between 
two obstacles was wide enough to be cleared by the 
car. The gap was set up at right angles to the track 
and could be varied from 3 in. smaller than the 
car to 9 in. wider, in steps of 3 in. There were thus 
three ‘possible’ gaps and two ‘impossible.’ This 
range of sizes was chosen on the results of a pretest 
which showed that Ss thought it possible to drive 
through a gap 3 in. wider than the car on about 
50% of the trials. 

Each trial drive required S to make judgments on 
20 gaps (4 of each size) which were arranged in 
random order and spaced at equal intervals around 
a circuit 1.5 ml. in length. The S was not allowed to 
stop during a trial, but otherwise driving was self- 
paced. If he drove past the marker placed 18 ft. in 
front of each gap he was considered to have accepted 
the gap as possible and no other indication of his 
decision was required. Having accepted a gap he had 
to drive straight through it, regardless of its actual 
size. He indicated rejection of a gap by turning left 
at the marker and rejoining the track on the far 
side of the gap. This alternative route was designed 
to impose a comparable delay for accepted and 
rejected gaps, in order to minimize any biasing of 
the data by Ss who might be motivated to maintain 
speed instead of attempting to drive through the 
more difficult gaps. 

(2.) A second measure of driving performance 
was obtained by recording the number of possible 
gaps which were cleared successfully by S when he 
decided to drive straight ahead. He was considered 
to have failed if any part of the car touched either 
obstacle. 

(3.) Third, speed of performance was obtained by 
recording the time taken to drive around each com- 
plete circuit of 20 gaps. 

(4.) Finally, a set of measures was recorded auto- 
matically from the frequency with which S used 
the steering wheel and foot controls of the car and 
from the lateral and longitudinal accelerations he 
imposed on it. 


Telephoning Task 


Messages transmitted over the radiophone pre- 
sented S with a reasoning test based on grammatical 
transformation, described in detail by Baddeley 
(1968). The S heard a series of sentences, each of 
which was followed by the letters “A” and “B.” Each 
sentence claimed to describe the order in which the 
following pair of letters would be spoken and S had 
to decide whether the description was true or false. 
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He responded by speaking the words “True” or 
“False,” unless he had missed the message, in which 
case he asked for the next sentence. 

Examples of the sentences are as follows: 


Examples: Ss Correct response 
A follows B—BA True 
B precedes A—AB False 
A is followed by B—AB True 
B is not followed by A—BA False 
B is preceded by A—BA False 
A does not precede B—BA True 


As in Baddeley’s initial experiment, the test com- 
prised all 64 possible combinations of the following 
6 binary conditions: (a) positive or negative, (b) 
active or passive, (c) true or false, (d) precedes or 
follows, (e) A or B mentioned first in the sentence, 
(f) following letter pair AB or BA used. Five dif- 
ferent test forms were used, with the 64 items oc- 
curring in a different random order on each. It took 
about 2.5 sec. to read out each sentence and letter 
pair, and S$ had to respond as quickly as possible, 
consistent with accuracy. There was a pause of about 
2 sec. between S’s response and the beginning of the 
next sentence. 

Telephoning performance was measured by scoring 
each response as correct or incorrect and by record- 
ing the time taken to respond after the letter pair 
had been spoken. 


Apparatus 


All tests were run in an Austin A40 estate car, 
which had manual gearshift, braking, and steering. 
The car was 5 ft. wide. Frequency of control move- 
ments was recorded on digital counters enclosed in a 
soundproof box. They were operated via micro- 
switches which were activated whenever the steering 
wheel, accelerator, brake, or clutch pedals were 
moved sufficiently to change the velocity of the car. 
Three other counters recorded lateral accelerations by 
pooling positive and negative readings within the 
ranges .1—.2 g, .2-.3 g, and greater than 3 g. Longi- 
tudinal accelerations were similarly recorded. Counter 
readings were obtained photographically before and 
after each trial. Driving time per circuit was mea- 
sured by stopwatch. 

The telephoning task was presented by E: from a 
mobile transmitter truck parked beside the track. 
Messages were received in the car from a loudspeaker 
mounted in front of S and he responded via a tele- 
phonist’s headset. The Hs sat at a small console in 
the back of the car and controlled the transmit/ 
receive selector of the telephone link by footswitch. 
Thus S$ had no radiophone controls to manipulate 
and any impairment of driving skills could be at- 
tributed to divided attention. The E1’s transmissions 
and S’s responses were recorded on magnetic tape 
for subsequent analysis of response times. 

The gaps were formed by pairs of obstacles 4 ft. 
high and 20 in. wide, constructed from hardboard 
on softwood frames and painted white. 


INTERFERENCE OF DRIVING AND TELEPHONING 


TABLE 1 


KrrEct oF ‘TELEPHONING ON Errors 
or GAp-JUDGMENT 


Y, errors 





Clearance | % errors | when tele-| Increase 
in gap | when driv-| phoning in % py 
(in.) ing alone | concur- errors 
rently 

—3 28.0 47.2 19.2 <.01 
0 70.8 93.0 EL <.01 
3 79.5 81.2 1.7 >.05 
6 28.5 39.2 10.7 >.05 
9 Ta 18.5 10.8 =.05 


Note.—Data have been corrected for guessing. 
*One-tailed Wilcoxon matched-pairs signed-ranks test of 
significance, (see Siegel, 1956, p. 75). 


The Ss were 24 men within the age range 21-57 
(median age 41). Their car-driving experience, as 
judged from the length of time they had held a 
license, ranged from 3 to 37 yr. (median time 154 
yr.). Twenty-two men were volunteers from various 
establishments of the United Kingdom Ministry of 
Transport, the remaining 2 were drawn from the 
APRU research panel. Only 1 S had experience on 
mobile radiophones. 


Procedure 


After a short introductory explanation of the 
experimental objectives and method, S had 5 min. 
practice on sentence-checking alone in the stationary 
car, followed by one practice trial of driving alone 
during which he made judgments of the gaps and 
tried to drive through those which he considered 
were possible. Finally, he had one practice trial of 
driving and telephoning concurrently. 

This was followed immediately by six test trials: 
three in the order given during practice and three in 
the reverse order. Thus each S had 10 min. of testing 
on sentence-checking alone and two trials of driving 
alone. These provided individual baseline measures 
of performance, for comparison with the two trials 
of driving while telephoning. 

There was always a short pause between succes- 
sive trials of driving, during which the size of the 
gaps was altered according to a prearranged schedule. 
Thus S met the various sizes of gap-in a different 
order on each trial. 


RESULTS 


Learning on the tasks of driving and tele- 
phoning was negligible. Therefore data from 
the first and second trials in each condition 
were pooled to give overall measures of per- 
formance on telephoning alone, driving alone, 
and telephoning while driving. 
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Table 1 shows that errors of gap-judgment 
were higher for all sizes of clearance when Ss 
also had to telephone, although the difference 
was statistically significant only for the two 
impossible gaps and for the largest possible 
gap. This means that when Ss operated under 
conditions of divided attention, they tried to 
drive through far more gaps that were smaller 
than the car and slightly fewer that were 
larger. 

Table 2 shows that skill in steering through 
possible gaps was not reliably impaired by 
telephoning, although there was a tendency 
for performance to be degraded when clear- 
ance was reduced to 3 in. 

Driving time per circuit was 361.3 sec. 
when Ss were driving alone and 385.2 sec. 
when they were telephoning concurrently. 
This 6.6% reduction in speed was statistically 
significant (p< .01, Wilcoxon test). Tele- 
phoning had no reliable effect on the fre- 
quency with which the car controls were used, 
nor did it affect the lateral and longitudinal 
accelerations imposed on the vehicle (p> 
.05, Wilcoxon test). 

Table 3 shows that speed and accuracy of 
telephoning performance were both substan- 
tially impaired when Ss also had to drive. 

It can be inferred from the observed reduc- 
tion in speed when Ss had to telephone con- 
currently that they were attempting to gain 
time in which to handle the additional infor- 
mational load. However, Tables 1 and 3 show 
that this change in speed was insufficient to 
prevent mutual interference between gap- 
judgment and sentence-checking. It seemed 


TABLE 2 


Errecr or TELEPHONING ON SKILL IN STEERING 
THROUGH PossIBLE GAPS 





% gaps cleared 7 
maa gine oe when tele- | Change in * 
n'84PS | when driy-| phoning % gaps p 
(in.) ing alone | concur- cleared 
rently 
3 84.1 75.8 8.3 fewer | >.05 
6 95.2 95.5 .3 more | >.05 
9 99.2 99.3 .1 more | >.05 


* Wilcoxon test. 











TABLE 3 
Errrect oF Drivinc ON TELEPHONING 
PERFORMANCE 
Sentence- 
atl ac§ Sentence- | checking Change in 
peers Ch checking | when driv- | performance p* 
I alone ing concur- measure 
rently 
% errors 
(corrected 
for guess- 
ing) 23.8 45.0 21.2 more <.01 
Response 
time (sec.) 1.81 2.60 .79 longer | <.005 


® Wilcoxon test. 


interesting to look further into the way in 
which the additional driving time was used 
under conditions of divided attention. This 
was done by calculating for each S: (a) the 
change in driving time and in gap-judgment 
errors from driving alone to driving while 
telephoning, and (0) the change in errors and 
in response-time on sentence-checking from 
telephoning alone to telephoning while driv- 
ing. 

There was a significant positive correlation 
between increase in driving time and increase 
in errors of gap-judgment (tau = 314, p< 
.0316; Kendall’s rank correlation coefficient, 
see Siegel, 1956, p. 213). There are three pos- 
sible explanations of this correlation: (ca) 
that judgment of clearance was impaired by 
reduced speed per se, which seems improba- 
ble, (6) that the increased time resulted from 
the greater caution with which Ss drove 
through the additional impossible gaps they 
had judged incorrectly when telephoning, (c) 
that the increased time was taken in order to 
maintain performance on the telephoning task, 
which biased attention away from the task of 
gap-judgment. The latter explanation is sup- 
ported by the finding that increase in driving 
time and increase in errors on sentence-check- 
ing were negatively correlated (tau = —.441, 
p < .0026). The correlation between increase 
in driving time and increase in response time 
on sentence-checking was small and statisti- 
cally unreliable (tau=.074, p< .230). 
Therefore the inference is that Ss were using 
the additional driving time to maintain accu- 
racy of sentence-checking. However, Expla- 
nation 2 above is also tenable, suggesting 
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that the positive correlation between increase 
in driving time and increase in errors of gap- 
judgment may, at least partly, have been an 
artifact of the experimental procedure which 
required Ss to drive through all accepted gaps, 
regardless of their actual size. 

There was a suggestion that the order of 
priority given to the two concurrent tasks 
was a function of age. Younger Ss tended to 
maintain performance on sentence-checking 
at the expense of increased errors on gap- 
judgment. The reverse tendency was observed 
among older Ss, although the effect of tele- 
phoning on driving time was comparable for 
all age groups. With the present small group 
of Ss these age effects were statistically un- 
reliable (p> .05; Kruskal-Wallis one-way 
analysis of variance, see Siegel, 1956, p. 184). 


DIscuUSSION 


Judgments of clearance were degraded 
across the complete range of gaps used in the 
experiment, although the effect was small on 
gaps larger than the car (see Table 1). There 
are three possible sources of this degradation: 
(a) interference between representations of 
visual and auditory stimuli within short-term 
memory, which would have impaired integra- 
tion of successive visual samples of gap size. 
As successive sampling is not essential to 
judgments of clearance, this seems an unlikely 
source of the observed interference. (5) a 
relaxation of the criteria on which gap judg- 
ments were made. Although this explains the 
increase in errors on impossible gaps, it could 
not have been the sole source of interference, 
or more gaps of any size would have been 
accepted when Ss were driving and telephon- 
ing concurrently. In fact, fewer possible gaps 
were accepted under this condition (see Table 
1). (c) an impairment of perception resulting 
from switching between sensory modes. This 
also could not have been the sole source of 
interference, or the most difficult judgment 
(of 3 in. clearance, see Table 1) would prob- 
ably have been degraded most rather than 
least. The possibility that a ceiling effect was 
operating to stabilize errors at this level is 
ruled out by the finding that telephoning had 
a greater impact on judgments of zero clear- 
ance. 


INTERFERENCE OF DRIVING AND TELEPHONING 


It must be concluded that concurrent tele- 
phoning produced both a relaxation of criteria 
and an impairment of perception. During 
judgments of possible gaps, the tendency for 
impaired perception to produce errors of re- 
jection would have acted in opposition to the 
tendency for relaxed criteria to produce errors 
of acceptance. During judgments of impossi- 
ble gaps, both impaired perception and re- 
laxed criteria would have produced errors of 
acceptance. This explanation would account 
for the finding that divided attention had the 
differential effect of causing a significantly 
large increase in acceptance of impossible 
gaps, but a smaller and mainly insignificant 
increase in rejections of possible gaps. Since 
the latter effect was an increase in rejections, 
it may be inferred that performance on gap- 
judgment was affected more by impaired 
perception than by relaxed criteria, assuming 
that the relative importance of the two effects 
was stable across the range of gaps used. 

The perceptual-motor skills employed in 
steering through possible gaps were not reli- 
ably affected by telephoning (see Table 2). 
Performance was beginning to deteriorate 
when clearance was reduced to 3 in., but this 
source of interference is unlikely to present 
a major problem on the road, since even in 
the experiment Ss were rejecting about .80% 
of these gaps as impossible (see Table 1). 
Steering skills are probably so automatized 
among trained drivers that they are minimally 
degraded when attention has to be diverted 
intermittently to an auditory stimulus. How- 
ever, it must be noted that drivers currently 
have to manipulate equipment to make and 
take calls, and these motor activities could 
interfere with steering skills as well as with 
the use of other manual controls. 

The increase in driving time observed when 
Ss were telephoning could have resulted sim- 
ply from the greater caution with which they 
drove through the additional impossible gaps 
they had judged incorrectly in this condition. 
This would account for the finding that errors 
of gap-judgment were positively correlated 
with driving time. The alternative explana- 
tion is that Ss deliberately reduced speed in 
order to handle the additional load. If the 
latter were true, the results indicate that the 
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additional time was used to maintain per- 
formance on the telephoning task, at the 
expense of increased errors on gap-judgment. 
It is impossible to distinguish conclusively 
between these explanations on the experi- 
mental data alone, because driving and tele- 
phoning were temporally independent and be- 
cause driving time was measured only as a 
total score per trial. Further research would 
be needed to investigate these alternatives and 
to study the possibility that order of priority 
given to concurrent tasks of driving and tele- 
phoning may be a function of the driver’s 
age. It would also be necessary to investigate 
the timing of control skills before it could be 
concluded that these are entirely unaffected 
by divided attention. 

Both speed and accuracy of telephoning 
were affected by the driving task (see Table 
3). It is impossible to say how much of this 
decrement simply resulted from concurrent 
performance of the usual control skills em- 
ployed in driving and how much resulted 
from the experimental tasks of judging clear- 
ance and driving through gaps, because con- 
ditions of testing precluded any detailed 
recording of the temporal relationships be- 
tween driving and telephoning. It seems clear 
that calls via mobile radiophones will take 
longer than ordinary calls, since Table 3 
shows that messages containing little redun- 
dancy are substantially affected by driving, 
therefore repetitions would be necessary in 
order to transmit all the information in prac- 
tice. Even with the greater redundancy of 
plain speech messages, complete failure of 
communication could occur if division of at- 
tention were dictated by traffic conditions 
rather than by the content of the telephone 
message. 

The general conclusion must be that some 
mutual interference between the concurrent 
tasks is inevitable under conditions of tele- 
phoning while driving on the road. The re- 
sults suggest that, although more automatized 
control skills may be affected minimally by 
this division of attention, some perceptual 
and decision skills may be critically impaired. 
The extent to which this impairment is a 
function of the driving task, the informa- 
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tional content of the telephone message, and 
the individual characteristics of the driver 
must remain a subject for further research. 
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The major component of evaluation from a 2-day assessment program covering 
47 members of a large national marketing organization consisted of ratings 
of degree of active participation in the group situational exercises, followed 
by ratings of administrative and decision-making ability. Paper-and-pencil 
ability tests and personality inventories were less clearly related to assessments 
of managerial potential. Ratings of management potential developed from a 
careful review of company personnel records were as highly correlated with 
the assessment center data as were overall ratings from the 2-day program, 
except for ratings dealing with interpersonal behavior. 


One of the most pressing problems which 
industry will face over the coming decade is a 
severe shortage of qualified management per- 
sonnel. The combined effects of significant ex- 
pansion in business activity along with a 
labor force in which there will be a decline in 
the absolute numbers of participants in the 
key manager age bracket of 35-45 are more 
and more forcing companies to take a hard 
look at the procedures which they utilize for 
the identification of management talent. As a 
result, the current focus is on the early iden- 
tification of talent, as highlighted by the 
classic study undertaken by the Standard Oil 
Company of New Jersey (Laurent, 1961). 

Increasingly, a variety of assessment tech- 
niques are being utilized in an effort to pre- 
dict managerial potential, and studies are 
showing that a spectrum of inputs can add 
valid information to the old standbys of group 
tests of ability and temperament. Although 
previous research has often reported negative 
findings, recent research suggests that the 
interview can be a valid assessment procedure 
if carefully conducted (Ghiselli, 1966; Prien, 
1962). And, contrary to the assumptions of 
many industrial psychologists, clinical tech- 
niques have been shown to have promise 
(Albrecht, 1964). 

An approach which has been used increas- 
ingly involves situational tests (Flanagan, 
1954) in an effort to approximate closely the 
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kinds of behaviors required for managerial 
jobs. A number of companies have been 
experimenting with systematic management 
assessment programs built largely around 
situational exercises (Bray & Campbell, 
1968; Bray & Grant, 1966; Grant, Katkov- 
sky, & Bray, 1967; Greenwood & McNamara, 
1967; Hardesty & Jones, 1968). Such pro- 
grams are designed to bring potential candi- 
dates for management together for several 
days, to have them participate in a number of 
group and individual exercises, to have them 
take a battery of ability and personality 
tests, and then to have a team of observers 
distill the results of the program into a series 
of predictions of probable management po- 
tential for each of the program participants. 

This is clearly a costly process. In addi- 
tion to the facilities and materials required to 
run such a program, even greater costs of 
several days attendance by the participants 
and by the staff of observers suggests that 
management invests a great deal of money 
into the assessment center process in order to 
arrive at a relatively straightforward predic- 
tion of promotability. While research has 
shown, at least at AT&T, that assessment 
evaluations are predictive of subsequent pro- 
motion into management and of salary growth 
(Bray & Grant, 1966), we are still left with 
several rather gnawing questions: 

(1) How much does the program contribute 
beyond what is already known about the can- 
didates? In one sense the answer is ‘‘Noth- 
ing,” since typically the effectiveness of the 
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assessment process is subsequently validated 
against what happens in the organization un- 
der more naturalistic conditions, that is, 
whether or not in the normal course of events 
an individual is promoted. If this is the ulti- 
mate criterion against which an assessment 
program is validated, one might question 
“Why bother with the program in the first 
place?” The answer, of course, is that one 
hopes the assessment program will be able to 
identify promotable people earlier in their 
careers, that it will help to clarify some of 
the skills important in promotion, and that it 
will perhaps identify some people who should 
be promoted but who might under normal 
circumstances be overlooked. Hopefully, also, 
the program will fulfill to some extent a per- 
sonal development function by providing prac- 
tice in group situations, individualized feed- 
back regarding observed strengths and weak- 
nesses, and greater understanding regarding 
the caliber of the competition participants 
are up against. But where is the trade-off, no 
matter how valid the assessment process, be- 
tween the use of an assessment program and 
the prevalent approach to managerial selec- 
tion used in most organizations of “letting 
the chips fall where they may’? 

(2) A parallel question concerns how much 
is gained by the use of situational tests—the 
group and individual exercises which really 
are the most costly aspects of the assessment 
center approach—beyond what can be ob- 
tained by more traditional paper-and-pencil 
tests. And if an additional input is obtained 
from situational exercises, is it an increment 
sufficiently large to justify the cost? 

(3) Another question asks how much re- 
dundancy there is in the data collected in a 
typical assessment program: How many of 
the tests and exercises overlap, and to what 
extent can the program be streamlined in an 
effort to make it both more efficient and less 
expensive? 

The purpose of the research reported in 
this paper was to evaluate preliminary data 
from one management assessment program 
from a number of points of view. In any 
evaluation of this type, of course, the most 
pressing problem is the lack of criteria. The 
nature of the program calls for a predictive 
validation strategy, since this is what the 
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program is all about: the prediction of man- 
agement potential. And this is what the AT&T 
research has focused upon, though even here 
the full assessment of validity was felt to be 
a bit premature and the predictive data were 
allowed to mellow for only 8 as opposed to 
the originally intended 10 yr. (Bray & Grant, 
1966). In the present instance, it was neces- 
sary to obtain some evaluation of the pro- 
gram’s value before a 10- or 15-yr. delay, and 
toward this end this research was designed to 
relate the various components of the program 
to several criteria of value. Although this is 
far from an adequate validation, and purely 
concurrent in nature, it was hoped that the 
analysis would provide some basis for evalu- 
ation by looking at the relationship between 
the program prediction and (a) external cri- 
teria, in the form of some assessment of cur- 
rent value to the organization of the program 
participants, (b) internal criteria, in the sense 
of relationship between the program com- 
ponents and the overall management poten- 
tial evaluation flowing from the 2-day pro- 
gram, and (c) parallel criteria, in the sense 
of presently available evaluations of mana- 
gerial potential which were representative of 
those utilized in the normal promotional sys- 
tem of the organization. 

The research was also designed to determine 
what makes up the final evaluation—what 
gets the major weight in the assessment pre- 
diction in this particular organization, and to 
what extent there is redundancy in the mea- 
sures which are collected. At the same time, 
an evaluation of the extent to which the situ- 
ational and the paper-and-pencil methodolo- 
gies yielded comparable assessments could be 
viewed as a rough approach to obtaining some 
degree of construct validation of the concepts 
covered in the program. Another goal of this 
analysis was to determine how to reduce 
meaningfully the number of variables col- 
lected so as to understand better the relation- 
ships among them and to facilitate analysis of 
the data. 


Data COLLECTION 


The Ss in this study were 47 college-edu- 
cated male employees engaged in marketing 
activities for a large technology-based organi- 
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zation. The Ss came from all areas of the 
country and had been employed by the or- 
ganization for an average of 7.3 yr. at the 
time of assessment. Their average age was 33. 
Some Ss were recently promoted firstline 
managers, and some were nonmanagerial per- 
sonnel who had been identified by manage- 
ment as having management potential. They 
were selected at random within the criteria 
of being either new managers or senior per- 
sonnel considered promotable in the very 
near future to a management position. Groups 
of managers and nonmanagers were kept 
separate for the various situational exercises 
in the program. 

The assessment program employed was 
very similar to that described by Bray and 
Grant (1966). Group situational exercises 
were identical to those described by Green- 
wood and McNamara (1967): (a) Leader- 
less Group Discussion, (b) Task Force Com- 
mittee, and (c) Manufacturing Game. Indi- 
vidual situational exercises included (a) an 
In-Basket of 25 items which the individual 
was given an hour and a half to cover. He 
was then interviewed by a member of the 
assessment staff regarding his decisions and 
rated on this performance. (b) A Stock Mar- 
ket Exercise, in which the individual had to 
respond to hypothetical market fluctuations 
investing a certain sum of money. (c) A Job 
Environment Report in which the individual 
was requested to describe his job in narrative 
form. 

Paper-and-pencil tests were also given 
throughout the 2-day program and consisted 
of (a) Concept Mastery Test, Form T, (4) 
School and College Ability Tests (SCAT), 
Form U—(Numerical Part only), (¢) Gor- 
don Personal Profile, (d) Allport, Vernon, 
Lindzey Study of Values, (e) Leadership 
Opinion Questionnaire, (f) Ghiselli Self-De- 
scription Inventory, (g) Risk-Taking Scale 
(Williams, 1965), and (%) a background and 
contemporary data questionnaire of personal 
history questions. Keys for general manage- 
ment potential and for self-confidence had 
been previously developed, and two scores 
were derived for each program participant. 

At the conclusion of the two days of situ- 
ational and paper-and-pencil tests, the ob- 
servers who formed the assessment staff and 
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who were all operating management person- 
nel at least two levels above program partici- 
pants met to discuss the various exercises and 
to consolidate the ratings they had made of 
the participants on traits which were appro- 
priate for the exercise which they observed. 
Twelve trait ratings were derived as a result 
of all of the situational exercises: (a) Aggres- 
siveness, (5) Persuasive or Selling Ability, 
(c) Oral Communications, (d) Planning and 
Organization, (e) Self-Confidence, (f) Re- 
sistance to Stress, (g) Written Communica- 
tions, (4) Energy Level, (7) Decision Mak- 
ing, (j) Interpersonal Contact, (k) Adminis- 
trative Ability, and (7) Risk Taking. 

These ratings were on a 5-point scale from 
1—outstanding—to 5—definitely below ave- 
rage. In addition, an overall evaluation of 
management potential, which we have termed 
our “internal criterion,” was arrived at in 
this evaluation session based on ratings in the 
situational exercises and scores on the various 
tests. Analyses have shown adequate reliabil- 
ity for these types of ratings (Greenwood & 
McNamara, 1967). 

In an effort to obtain a comparable evalu- 
ation of management potential based only on 
data readily available to the organization, two 
experienced managers were asked to review 
all of the personnel records for the 47 par- 
ticipants in the program and whatever addi- 
tional information would normally be avail- 
able to them in making an initial promotional 
decision. They then rated the 47 people on 
management potential utilizing a similar rat- 
ing scale to that employed in the assessment 
program. We have termed these ratings a 
“parallel criterion.” 

These two managers were assistants to dis- 
trict marketing managers who quite fre- 
quently became involved in the evaluation of 
qualifications of candidates for promotion in 
the organization. In effect, the task given 
them was to perform the type of analysis they 
would ordinarily go through in their day-to- 
day job activities in coming up with candi- 
dates for promotion to recommend to their 
district manager. The instructions to them 
stated 


This will be essentially a subjective process of 
distilling all of the information which is available 
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on 47 men to arrive at a judgment regarding 
their long-term potential to the company. By 
long-term we don’t necessarily mean until age 65, 
but probably out for a period of 10-15 years. 
After going through all of the materials, record 
your best judgement of the highest level in the 
company which you can see this individual attain- 
ing, based upon the descriptions given on the 
attached pyramid of position levels. Proceed as if 
you were trying to select a candidate for an im- 
portant promotion. 


a. Work through all the materials in the person- 
nel jacket. Specifically look at educational and 
experience qualifications, appraisal evaluations, 
any special commendations or accomplishments. 

b. Review his performance evaluations. In addi- 
tion to his appraisal ratings, see how he is 
ranked by management among his peers. Is he 
on the promotion list? The outstanding em- 
ployee list? What is his sales record? Rate of 
earning growth? How was he evaluated in 
training ? 

c. Talk with his immediate manager, and, if 
desirable, his prior managers. Make liberal use 
of the telephone. Finally, make a judgement, 
independently, and record it. 


Each of these managers spent over a day 
independently reviewing the 47 personnel 
jackets and arriving at their ratings. Al- 
though they saw this as a tedious task, they 
were able to make these predictions. It 
should be recognized that this parallel cri- 
terion was not based upon any face-to-face 
confrontation with the assessee; the evalu- 
ators indicated that they were not personally 
familiar with the people they were rating, 


TABLE 1 


Factor ANALYSIS OF ASSESSMENT RATING SCALES 








Rotated factor loadings 
Rating scale 





Persuasive or Selling Ability 95 ; —15 04 | 92 
Aggressiveness 89 02) —15 |.89 
Energy Level 82 | —03 14 | 72 
Interpersonal Contact 81 11 02 | 77 
Oral Communications 79 13 08 | 75 
Self-Confidence 73 23 12 lio 
Decision Making 23 66 | —11 | 62 
Planning and Organization 07 62 08 | 49 
Written Communications O1 62 | —36 | 46 
Administrative Ability 28 56 13 | 59 
Risk Taking —28 53 OLUIOL 
Resistance to Stress 52 26 46 | 73 





Note.—Decimals omitted. 
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with the exception of one or two cases, and 
only rarely did they see fit to utilize the tele- 
phone for any kind of references from present 
or past managers. So their ratings came al- 
most exclusively from their personal assess- 
ment of company personnel records. And, 
needless to say, they had no knowledge of the 
ratings assigned in the 2-day assessment pro- 
gram. 


ANALYSIS AND RESULTS 


In an effort to clarify the concepts being 
evaluated in the exercises and to reduce the 
number of variables being dealt with to a 
more manageable number of relatively unique 
scales, the trait ratings which synthesized the 
observers’ ratings in the situational tests were 
factor analyzed. For this analysis, the 12 
ratings were intercorrelated, the maximum 
row element was inserted in the main diagonal 
as an estimate of communality, and the mat- 
rix was factored by the method of principal 
components. Between two and six factors 
were rotated obliquely using Carroll’s (1960) 
biquartimin rotation. The three-factor solu- 
tion shown in Table 1 seemed best to repre- 
sent the structure of this particular matrix. 

Factor 1 is clearly an activity factor, with 
heavy loadings from ratings of persuasiveness, 
aggressiveness, energy level, interpersonal 
contact, oral communications, and self-confi- 
dence. The relatively pure factor picture is 
one of active participation in the group situ- 
ational exercises. 

Factor 2 reflects more individually based 
administrative kinds of skills, with heavy 
loadings of decision making, planning and 
organizing, written communications, and ad- 
ministrative ability. 

The third factor, while not showing as large 
loadings as the previous two, appears to re- 
flect a component of resistance to stress with 
loadings for resistance to stress and risk tak- 
ing. In large measure this is a residual factor 
since both of these traits have their primary 
loadings on Factors 1 or 2. An examination 
of the rotations of more than three factors for 
these data does not suggest additional mean- 
ingful factors; it would seem that at most 
three components of evaluation are being de- 
rived from the various situational tests with 
this particular population and this particular 
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TABLE 2 


INTERCORRELATIONS* AMONG ASSESSMENT SCALES AND CRITERIA 











Variable 


. Rating Scale 1—Activity 

. Rating Scale 2—Administration 

. Rating Scale 3—Stress resistance 

. Relative salary standing (external criterion) 

. Managers’ potential evaluation (parallel criterion) 
. Assessment program evaluation (internal criterion) 


An Pownde 


Verne eA es 8G 756.5 
Sas a7 P77 8" 72 
errs 1674482 SO 36 
ete. 05 15 

= 10 37 37 





Note.—Decimals omitted. 
sy = .29 significant at p < .05 with df = 45. 


program: (@) extent of active and aggressive 
participation in the interpersonal exercises, 
(6) ability as an independent decision maker 
and administrator, and (c) ability to function 
effectively in stressful situations and a willing- 
ness to take risks. 

In the hierarchical factor analysis of the 25 
traits included in the AT&T program, Bray 
and Grant (1966) identified 11 factors for 
their college graduate sample. Their analysis 
of higher order factors suggested that a 
major part of the variance was accounted for 
by several general factors reflecting overall 
program evaluations throughout most of their 
scales. However, considerable variance was 
accounted for by more specific factors. For 
the present study, the trait scales utilized for 
ratings appear more general than many of the 
AT&T rating scales and they reduce to two 
basic dimensions and possibly suggest a third. 
Based on these results, the trait ratings were 
collapsed into three summary scales and 
scores were developed for each individual: 
(a) Rating Scale 1—Activity: the mean of 
the six trait ratings loading most heavily on 
Factor 1. (6) Rating Scale 2—Administra- 
tion: the mean of the four major trait ratings 
for Factor 2. (c) Rating Scale 3—Stress re- 
sistance: the mean of ratings on resistance to 
stress and risk taking. 

Table 2 presents intercorrelations among 
these scales and the several “criteria” uti- 
lized in the research: the assessment program 
evaluation (internal criterion), the personnel 
jacket evaluation of potential (parallel cri- 
terion), and a current value criterion repre- 
senting relative current salary standing, in 
thirds, of the individual in comparison with 
his peers (external criterion). Table 2 also 


shows partial correlations between each scale 
and the internal criterion, controlling for the 
correlation with the parallel criterion as an 
indication of the contribution of criterion vari- 
ance by each assessment program scale over 
and above variance associated with already 
available data in personnel records. Table 3 
presents the correlations of the paper-and- 
pencil test scores—the two ability tests (Con- 
cept Mastery and SCAT—Numerical) and 20 
scales derived from the battery of person- 
ality tests—and each of the three summary 
rating scales and the three criteria, as well 
as partial correlations of test scores with the 
internal criterion controlling for the parallel 
criterion. 

As Table 3 suggests, there is some parallel 
between ratings obtained from the situational 
tests and the paper-and-pencil personality 
tests which lends a certain amount of credence 
to the constructs being measured. This is at 
least the case with regard to the construct of 
interpersonal activity as assessed by rating 
Scale 1 and by such personality test scales as 
GPP Ascendency, Background Survey—Self- 
Confidence, or SV Political which correlate 
significantly with the situational test ratings, 
suggesting that self-perceptions as measured 
with the personality tests and observer per- 
ceptions are to at least some extent parallel 
for this particular group. (It should be recog- 
nized that there was no contamination of the 
ratings from the personality test scores since 
these were not yet scored at the time the 
ratings were compiled.) There also appears to 
be some parallel between Scale 3 and such 
test scores as SDI—Occupational Level, Risk 
Taking, or SDI Initiative, also lending confi- 
dence to the constructs being evaluated. There 
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TABLE 3 


CORRELATIONS*® OF TEST SCALES WITH ASSESSMENT PROGRAM RATING SCALES AND WITH CRITERIA 





Rating scale 





Criterion measure 




















1 2 33 4. 5. 6. 
Test scale Relative | Manager’s | Assessment | Assessment 
1 2 3 salary potential | program partial 

standing | evaluation | evaluation 76.5 

(external) | (parallel) | (internal) 
Concept Mastery (Total) 06} 04| —04 24 —02 10 12 
SCAT—Numerical —23 | —03 | —20 03 —13 03 10 
Study of Values-Theoretical 04 25 08 —02 16 02 —06 
Study of Values—Economic 17 26 18 —09 25 —O1 —14 
Study of Values—Aesthetic —14] —01i | —04 —04 —05 —14 —13 
Study of Values-Social —17| —17 08 00 —28 —23 —12 
Study of Values—Political 38 DS 17 16 33 25 12 
Study of Values-Religious —18 | —06 | —10 04 —05 —18 —18 
Background Survey-Management Key —17 10 12 05 22 09 —O1 
Background Survey-—Self-Confidence Key 43 25 25 —05 31 27 15 
GPP—Ascendency 56 26 30 10 29 43 31 
GPP—Responsibility —34; O01) —18 08 —26 —19 —08 
GPP—Emotional Stability —25 | —09 | —09 —06 —37 —32 —18 
GPP—Sociability 26| 28 21 01 37 23 07 
LOQ-Initiating Structure 04 00 16 01 03 —13 —16 
LOQ-Consideration —15| —11 09 10 —13 —24 —20 
SDI-Intelligence PH) BY) || Oy 14 09 26 25 
SDI-Supervisory Qualities 29 12 07 26 06 32 Sa 
SDI-Initiative 17 12 36 —10 08 10 07 
SDI-Self-Assurance 25 32 18 15 05 33 35 
SDI-Occupational Level 42 28 44 13 SO ee 47 39 
Risk Taking 36} 26 36 —(2 29 42 34 








Note.—Decimals omitted 
8y = .29 significant at p < .05 with df = 45. 


is less evident parallel between test data and 
situational data for Scale 2. The mental abil- 
ity tests, Concept Mastery and SCAT (Quan- 
titative), appear to be unique in this array of 
data and have their highest correlations with 
one another (7 = .31). 

While there was far from perfect agreement 
between the two management representatives 
who developed ratings of management poten- 
tial from the 47 personnel jackets—their rat- 
ings correlated only .56—a mean of their in- 
dividual predictions (the parallel criterion) 
does appear to cover much of the same ground 
that is covered by the 2-day program (the 
internal criterion), with one significant excep- 
tion. As Table 2 shows, these two independent 
overall assessments of management potential 
correlate .46. Based on a comparison of the 
relationships between the rating scales and 
tests from the assessment program and each 


of these overall evaluations, it is evident that 
they are very similar except for Rating Scale 
1 dealing with interpersonal relationships; 
this scale correlates .78 with the overall pro- 
gram evaluation and .49 with the personnel 
jacket evaluation. This is hardly unexpected 
since the evaluators utilized only documents 
from the personnel file and did not interview 
and in most cases did not know personally 
the individuals they were evaluating. But for 
the other scales they seem to have overlapped 
fully as much predictor variance as the 2-day 
program. 


DIscuUSSION 


This analysis suggests that there is prob- 
ably not a great deal of intraindividual dis- 
crimination developed in the situational exer- 
cises utilized in the management assessment 
program described for this particular popula- 
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tion in this particular organization. There 
seems to be considerable overlap among the 
12 trait ratings, and it appears that a lim- 
ited number of concepts were actually evalu- 
ated. 

Most of the scales derived from the several 
personality tests appear to make little unique 
contribution to the assessment evaluation 
(correlations with the internal criterion for 
14 of the 20 scales are nonsignificant, and the 
largest is for the SDI Occupational Level as 
reported in Table 3). These results are prob- 
ably representative of the very severe prob- 
lems in the area of personality testing as dis- 
cussed by Guion and Gottier (1965). While 
data for a few of the tests appear promising 
(e.g., GPP, SDI, Risk taking), from multiple 
regression analyses it is not clear that they 
provide much incremental variance over situ- 
ational measures in explaining the criteria uti- 
lized in this study. The patterns of overlap 
between these test scales and the situational 
rating data evident in Table 3 suggests a simi- 
lar interpretation. This, of course, does not 
address the issue of the utility of these per- 
sonality test scales in a long-term predictive 
validation. 

The major component of assessment—both 
from situational and from personality tests— 
seems to be an evaluation of interpersonal be- 
havior. Mental ability measurements con- 
tribute essentially nothing to this prediction 
of managerial success. 

The data suggest that traditional ap- 
proaches to the assessment of management 
potential in the form of a careful evaluation 
of personnel records and employment history 
(our parallel criteria) can perhaps provide 
much of the same information which evolves 
from the lengthy and expensive 2-day assess- 
ment program, as many of the evaluation 
components which emerge from the program 
correlate as highly with this rating as they 
do with the internal criterion. However, the 
partial correlations indicate that these are not 
completely equivalent evaluations of mana- 
gerial potential. The biggest discrepancy is 
for Rating Scale 1—the evaluation of inter- 
personal behavior. Quite possibly, if this tra- 
ditional assessment were accompanied by an 
extensive personal interview, as would be the 
case in’ real life, traditional approaches might 
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be found to be even more effective in com- 
parison with the situational exercises. The 
results suggest that such a systematic evalua- 
tion of past history might fruitfully be in- 
cluded in overall assessments of management 
potential compiled in programs such as this. 

This comparison raises an issue which can 
easily be overlooked in the rush to institute a 
management assessment program: a clear defi- 
nition of just what it is hoped the program 
will do and where in an individual’s career it 
can be most fruitfully utilized. If such a pro- 
gram is looked to for evaluating candidates 
for middle management or higher positions, 
then one may justifiably question the appro- 
priateness of two days of situational exercises 
for evaluating management potential as op- 
posed to a careful review of prior job history 
and accomplishment. On the other hand, if 
the focus is on the earvly identification of po- 
tential where little job history has accrued, 
then the assessment center is probably a very 
effective means of synthesizing a rather close 
approximation to the type of potential predic- 
tion which would eventually evolve through 
on-the-job performance. And, as hinted at the 
outset, there are numerous other potential 
benefits which can be derived from such a 
program among relatively young premanage- 
ment candidates. 

The analysis still does not adequately talk 
to the question of validity, and the ultimate 
test of the utility of such an approach would 
have to be based on a careful predictive vali- 
dation strategy—an approach which all too 
often is essentially impossible considering the 
time span of the prediction involved and the 
pressures in real-life organizations for im- 
mediate evaluation of programs and imple- 
mentation of new techniques. However, the 
lack of concurrent validity in this analysis 
argues strongly for additional research to 
assess the predictive validity of this particu- 
lar management assessment methodology in 
this particular setting. While in all proba- 
bility our external criterion of relative salary 
standing was not as reliable a criteria of cur- 
rent worth as one might wish, there was also 
no correlation between program ratings and 
other concurrent criteria of performance rat- 
ings or normalized rankings. 
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This is not necessarily an indictment of the 
program’s possible value and perhaps should 
be viewed positively since the program’s in- 
tent was predictive rather than concurrent 
validity, and the rating which was evolved 
was of a future criterion variable. In this sit- 
uation, a fair validation would require allow- 
ing adequate time for meaningful criterion 
variance to develop. The AT&T data and 
other unpublished results suggest that predic- 
tive validities improve as the time interval 
between data collection and criterion measure- 
ment approximate the time span for the pre- 
dictive assessment (Bray & Grant, 1966, p. 
18).2 

At the very least, the lack of demonstrated 
concurrent validity in this study calls for con- 
tinued validation research to ensure proper 
utilization of this type of information which 
potentially can have such far-reaching impli- 
cations for individual careers. The present 
analysis does suggest, however, that the as- 
sessment center evaluation contains reliable 
variance which may be associated with man- 
agement potential. And the research suggests 
some areas for improvement of the program 
within this particular environment. 

2JIn this regard, it is interesting that the person- 
ality test scales in this study which are most strongly 
correlated with the assessment evaluation of mana- 
gerial potential clearly have the flavor of upward 


mobility—occupational level, ascendency, risk tak- 
ing, self-assurance, or supervisory qualities. 
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PROCESSING AUDITORY INFORMATION: 


INTERFERENCE 


FROM AN IRRELEVANT CUE 


J. RICHARD SIMON! anp A. M. SMALL, Jr.? 


University of Iowa 


In a choice RT task, 64 Ss pressed either a right- or left-hand key in response 
to directional commands provided by 400 and 1000 cps tones. On monaural 
trials, RT was significantly faster when the meaning of the tonal command 
corresponded with the ear in which it was heard (corresponding trials) than 
when it did not (noncorresponding trials). A comparison of monaural with 
binaural RT indicated that this Tonal Command X Ear Stimulated interaction 
was due to interference on the noncorresponding monaural trials rather than 


facilitation on the corresponding trials. 


In an experiment concerned with reaction 
time (RT) to monaurally presented verbal 
commands, Simon and Rudell (1967) discov- 
ered an extremely potent phenomenon, 
namely, that the speed of processing the sym- 
bolic content of a command was affected by 
the ear in which the command was heard. 
Their Ss responded significantly faster when 
the content of the command corresponded to 
the ear stimulated (i.e., “right” in right ear 
or “left” in left ear) than when it did not 
(i.e., “right” in left ear or “left” in right 
ear). Results clearly suggested that the audi- 
tory display provided two cues, one relevant 
(content of command) and the other irrele- 
vant (ear stimulated), and that the time re- 
quired to process the former was somehow 
affected by the presence of the latter. Left 
unclear was the basic nature of the cue pro- 
vided by the ear stimulated. Did the cue 
facilitate information processing on trials 
where it corresponded with the symbolic con- 
tent of the command or did it interfere with 
information processing on trials where it did 
not correspond with the content of the com- 
mand? Could the cue have operated both to 
facilitate responding on the corresponding 
trials and to interfere with responding on the 
noncorresponding trials? In other words, how 
was the Command x Ear Stimulated interac- 
tion produced? The primary purpose of the 
present experiment was to answer these ques- 


1 Requests for reprints should be sent to J. Rich- 
ard Simon, Department of Psychology, University of 
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tions. A second purpose of this study was to 
determine the generality of the Command X 
Ear Stimulated interaction. Is the phenome- 
non limited to situations involving verbal 
commands or does it also occur when simple 
stimuli such as pure tones are used to provide 
the relevant directional information? 


METHOD 


Apparatus. The apparatus provided a measure of 
choice RT to a series of tones presented to S 
through Telephonics TDH-39, 300 ohm earphones. 
The earphones were mounted in NAF-48490-1 
cushions and fixed to a standard headband. The Ss’ 
task was to press the correct one of two finger keys 
as soon as possible after hearing the tone. A 
Hunter klockounter started when the tone was 
presented and stopped when S pressed a key. De- 
pressing the key also signaled E as to which key 
(or keys) had been pressed. The keys, which S 
operated with his right and left index fingers, were 
mounted 12 in. apart on a table in front of him. 
Two Hewlett-Packard Model 200AB audio oscil- 
lators were used to generate tones of 400 and 1000 
cps. Silent switches permitted E to present one tone 
or the other to either the left ear, the right ear, or 
to both ears simultaneously. On monaural trials, 
the output SPL was 99 db. On binaural trials, the® 
output SPL was reduced to 93 db. so as to yield 
approximately the same loudness as on the monaural 
trials (Caussé & Chavasse, 1942). A warning light 
was presented 2 sec. prior to the onset of each tone, 
and there was a 7-sec. interval between trials. 

Subjects. The Ss were 32 male and 32 female Uni- 
versity of Iowa undergraduate volunteers. All Ss 
reported having normal hearing. 

Procedure and experimental design. Each S per- 
formed on two blocks of trials, one block involving 
monaural stimulation and the other involving bi- 
naural stimulation. On the monaural trials, either a 
high-pitched tone (1000 cps) or a low-pitched tone 
(400 cps) was presented to one ear. The Ss had no 
way of knowing, prior to the presentation of a tone, 
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Fic. 1. Reaction time to tonal directional commands 
as a function of ear(s) stimulated. 


which ear would be stimulated or what the tone 
would be. Half of Ss were instructed to press the 
right key when they heard the high-pitched tone 
and to press the left key when they heard the low- 
pitched tone. The other half of Ss were given the 
opposite tone-key rule. There were 56 monaural test 
trials in which the 400 and 1000 cps tones were 
presented equally often to each ear in a predeter- 
mined random sequence. The test trials were pre- 
ceded by eight practice trials in which each tone 
was presented twice to each ear in a random se- 
quence. On the binaural trials, either the 400 or the 
1000 cps tone was presented to both ears simultane- 
ously. Eight practice trials and 56 test trials were 
given in the same random sequence used in the 
monaural block. Each S$ performed both monaural 
and binaural blocks using the particular tone-key 
rule to which he was originally assigned. 

Half of the males and half of the females per- 
formed the monaural trials first while the other half 
performed the binaural trials first. Each sex X se- 
quence group was further subdivided by assigning 
eight Ss to one tone-key rule and eight to the 
opposite rule. Finally, in order to balance out any 
differences which may have existed between stimu- 
lus channels, the earphones were reversed for half 
of Ss in each subgroup. 


RESULTS 


Median RTs were computed for each S for 
each of the six treatment conditions, that is, 
right and left tonal commands in the right 
ear, the left ear, and in both ears simultane- 
ously. An analysis of variance of the monaural 
trials revealed no differences as a function of 
ear stimulated, tonal command, sex, or order. 
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The only significant effect was the expected 
tonal command X ear stimulated interaction, 
F (1, 60) = 276.35, p < .001. Figure 1 shows 
that RT was markedly faster when the right 
command was heard in the right ear than 
when it was heard in the left ear (370 vs. 432 
msec.). Similarly, RT to the left command 
was faster when it was heard in the left ear 
than when it was heard in the right (371 vs. 
439 msec.). Clearly, Ss responded signifi- 
cantly faster on trials where the symbolic 
content of the command corresponded with 
ear stimulated (corresponding trials) than on 
trials where it did not (noncorresponding). 

Since the major purpose of the experiment 
was to determine whether the tonal com- 
mand X ear stimulated interaction reflected 
a facilitation of information processing on 
the corresponding trials or an interference 
with information processing on the noncorre- 
sponding trials, additional comparisons of 
binaural with monaural RT were conducted. 
Binaural RT (356 msec.) was significantly 
faster than average RT on the noncorrespond- 
ing monaural trials (435 msec.)—F (1, 60) 
= 269.97, p< .001. Binaural RT was also 
significantly faster than average RT on the 
corresponding monaural trials (371 msec.)— 
F (1, 60) = 11.82, p < .01. Right responses 
tended to be faster than left responses, but 
this difference reached significance (p < .05) 
in only one of the analyses. 


DISCUSSION 


Results of this experiment clearly indicated 
that the tonal command X ear stimulated 
interaction on the monaural trials was a 
result of interference on the noncorresponding 
trials rather than facilitation on the corre- 
sponding trials. This conclusion was reached 
after comparing binaural RT with RT on 
both noncorresponding and _ corresponding 
monaural trials in turn. On the monaural 
trials, the ear in which the tone was heard 
provided an irrelevant cue which, of course, 
was absent on the binaural trials. Thus, bi- 
naural RT provided an appropriate baseline 
for evaluating the effect of the irrelevant cue. 
On the noncorresponding monaural trials, the 
irrelevant cue (ear stimulated) apparently 
conflicted with processing the relevant cue 
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(frequency of the tone) resulting in slower 
RT than on the binaural trials. On the corre- 
sponding monaural trials, the irrelevant cue 
coincided with the relevant cue (i.e., right 
command in right ear and left command in 
left ear), but this correspondence did not 
facilitate information processing. In fact, RT 
on the corresponding monaural trials was also 
significantly slower than binaural RT. The 
reason for this latter finding is not clear. Per- 
haps, the corresponding monaural trials were 
slowed by the presence of the noncorre- 
sponding trials in the same block. Alterna- 
tively, it may be that binaural stimulation 
per se results in faster RT than monaural 
stimulation. 

Another important outcome of this experi- 
ment was the demonstration that the com- 
mand X ear stimulated interaction, hereto- 
fore only observed with verbal directional 
commands (Simon & Rudell, 1967; Simon, 
1968), also occurred when pure tones were 
used to signal the appropriate response. Thus, 
it appears that the interaction reflects a basic 
and general phenomenon which exists inde- 


435 


pendently of whether the command is com- 
municated verbally or nonverbally. It also 
appears that the interaction is unrelated to 
prior symbolic associations since, in contrast 
to verbal directional commands, the tones had 
no implicit directional significance. While 
much remains to be learned about the exact 
nature of the interference phenomenon, re- 
sults to date clearly underscore its potency 
and emphasize the importance of hitherto 
unrecognized spatial cues in decoding audi- 
tory displays. 
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IMAGE OF INDUSTRIAL PSYCHOLOGY AMONG 
PERSONNEL ADMINISTRATORS 


GEORGE C. THORNTON III 


Colorado State University 


A survey was conducted of impressions of industrial psychologists among a 
national sample of personnel administrators. Results showed 11% of all com- 
panies and 20% of the largest ones employ a psychologist full time, and 25% 
employ one or more as a consultant. Fifty percent of the respondents felt it 
would be desirable to have a psychologist in the company and 75% felt he 
would increase productivity and satisfaction. Ratings of perceived past con- 
tributions, future usefulness, and need for further research in 12 areas of 
specialization are presented. Comparisons are made with previous surveys over 


a 20-yr. period. 


While it has been noted generally that 
psychologists working in industry have been 
utilized more frequently and more broadly 
over the past decades, there is little system- 
atic evidence of these trends. This paper re- 
ports a survey of the impressions of a large 
sample of personnel administrators representa- 
tive of all American industries with regard to 
their impressions of the current use and po- 
tential contributions of psychology in indus- 
try. Comparisons are made with previous sur- 
veys (Feinberg & Lefkowitz, 1962; Stagner, 
1946; Tiffin & Prevratil, 1956). 


METHOD 


Questionnaire. A questionnaire was constructed to 
obtain the following information: demographic data 
about the respondent (age, amount and type of 
education, and authority to hire an industrial psy- 
chologist), descriptive data about the company 
(size, industry, amount of unionization, size of 
personnel department, and whether it employed an 
industrial psychologist), and the respondent’s im- 
pressions of industrial psychology (general rating 
of contribution toward increased productivity and 
employee satisfaction, and specific rating of past 
contribution, future contribution, and need for fu- 
ture research in 12 areas of specialization). 

Subjects. The Ss were a 20% random sample of 
personnel administrators chosen from the 1966 direc- 
tory of the American Society for Personnel Adminis- 
tration. Of the 600 questionnaires mailed, 319 usable 
ones were returned. The companies employing the 
respondents and nonrespondents did not differ sig- 
nificantly in size and region of the country. 


1 Requests for reprints should be sent to the 
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RESULTS 


The sample was typically in their 30’s and 
40’s, had a bachelor’s degree (54%), were 
trained in business administration or indus- 
trial relations (43%), and had the authority 
to employ a psychologist (37%). Only 12% 
of the sample of Feinberg and Lefkowitz 
(1962) reported such authority. The employ- 
ing companies consisted of approximately 
equal numbers of small (less than 1,000 em- 
ployees), medium (1,000-4,999), and large 
(over 5,000) companies having a wide range 
of union representation and sizes of personnel 
departments. 

Approximately half (46%) of the com- 
panies employ an industrial psychologist full 
time or on a consulting basis. Tiffin and 
Prevratil (1956) found the comparable figure 
to be 28.8%, the difference being significant 
at the .01 level. Eleven percent of all com- 
panies and 20% of the largest companies em- 
ployed an industrial psychologist full time. 
Stagner (1946) in his survey of large corpo- 
rations found that 30% employed a profes- 
sionally trained psychologist full time. 

The questions and responses concerning the 
impressions of personnel administrators toward 
psychologists are contained in Table 1. One- 
half of the sample felt it would be desirable 
to have an industrial psychologist actually in 
the company, and about three-fourths felt 
such a person could be useful in increasing 
both worker productivity and satisfaction. In 
considering the specific areas where psycholo- 
gists have been of value in the past, Ss most 
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TABLE 1 


RATINGS OF DeEsIRABILITY, USEFULNESS, AND 
NEEDED RESEARCH OF INDUSTRIAL 














PSYCHOLOGISTS 
(N = 319) 
Yes No 

Would (Do) you consider it desirable to 

have a professionally trained industrial 

psychologist in your company? 53% 43% 
Do you think the services of an industrial 

psychologist could be useful in your 

company in increasing (a) productivity? 729 22% 

(b) satisfaction? 76% 18% 

Useful- 
= Past Needed 
Subareas value Goes research 

Employee selection 42% 46% 31% 
Employee training 20% 32% 20% 
Managerial selection 47% 59% 39% 
Managerial training 32% 59% 39% 
Performance appraisal 20% 38% 38% 
Job evaluation 10% 14% 8% 
Labor relations 14% 15% 14% 
Employee motivation/attitude 

(morale) surveys 24% 60% 47% 
Safety and accident prevention 4% 11% 13% 
Organization analysis and 

planning 17% 25% 24% 
Human factors engineering 11% 34% 25% 
Consumer behavior 2% 8% 9% 
Other 17% 3% 5% 





frequently checked employee and managerial 
selection (42% & 47%), employee and mana- 
gerial training (20% ®& 32%), motiva- 
tion and morale surveys (24%), and per- 
formance appraisal (20%). In the areas of 
safety, labor relations, and consumer behay- 
ior, few of the sample saw any contribution. 
In terms of usefulness in the future, the same 
general pattern is noted. In the areas of 
managerial training, performance appraisal, 
employee motivation, and human factors en- 
gineering, there seems to be the feeling that 
more could be done in the future than has 
been done in the past. The evaluations of po- 
tential usefulness in the specific areas by this 
group is markedly similar to those of the 
previous samples. One area, organizational 
analysis and planning, was included in the 
\present study and not in previous studies. 
Twenty-five percent felt psychologists could 
be of service here, in comparison with Fein- 
berg and Lefkowitz’s (1962) conclusion: 
“The executives in our sample never hired a 
psychologist to deal with broad categories 
such as research or organizational structure 
[p. 110].” 
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In terms of needed research, the areas of 
motivation and attitude surveys were rated 
highest (47%). A number of other areas 
were checked by 25-35% of the group. The 
areas of job evaluation, safety, labor rela- 
tions, and consumer behavior were checked 
very infrequently. 


Discussion 


It is felt the sample was a representative, 
appropriate, and influential one, and thus al- 
lows meaningful statements about the current 
impressions of personnel administrators to- 
ward psychologists in industry. 

While it does not appear that the percent- 
age of larger companies employing a full-time 
psychologist has increased over the past sev- 
eral years, it may be that larger numbers of 
smaller one are, and it seems that more 
companies of all sizes are using psychological 
consultants. When executives have been 
asked over the past 20 yr. if they consider it 
desirable to have a professionally trained psy- 
chologist in their company, an affirmative 
answer has been given by virtually the same 
percent of respondents: 1946—53%; 1956— 
54.5%; 1962—66%; and 1967—53%. Even 
though 50% think it would be desirable to 
employ an industrial psychologist, only 11% 
do so. This would indicate there is a great 
opportunity for qualified and interested per- 
sons. Stagner (1946) found the same gap 20 
years ago and came to the same conclusion. 

The large majority of the personnel ad- 
ministrators felt that the services of a psy- 
chologist are useful. It was somewhat sur- 
prising that the evaluations of the specific 
areas were so similar to the evaluations in 
previous studies. Industrial psychologists have 
made advancements in understanding in many 
areas, but it seems these have gone unrecog- 
nized or have not been successfully applied 
to personnel problems. It should not be in- 
ferred that the need for research has dimin- 
ished; the results of this study alone would 
argue that personnel administrators are look- 
ing for advances in a number of different 
areas through additional research. 
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PREDICTION OF JOB PERFORMANCE FROM 
ASSESSMENT REPORTS: 


USE OF A MODIFIED Q-SORT TECHNIQUE TO EXPAND 
PREDICTOR AND CRITERION VARIANCE + 


GARLAND Y. DENELSKY ? anp MICHAEL G. McKEE 3 


Central Intelligence Agency 


Predictions of performance and personality characteristics made on the basis 
of preemployment psychological assessment reports were compared with subse- 
quent performance evaluations contained in the fitness reports of 32 govern- 
ment employees. Seven psychologists reviewed the assessment reports as a 
basis for predicting overall job effectiveness and specific performance and 
personality characteristics. They then reviewed the narrative section of each 
individual’s fitness report as a basis for rating the overall effectiveness of each 
person. Ratings were made using a modified Q-sort technique that reliably 
expanded the variances of predictor and criterion variables. A significant posi- 
tive relationship was found between predicted and actual effectiveness. In addi- 
tion, the psychologists were able to predict specific performance and personality 
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dimensions on a significantly better than base-rate basis. 


Over the past 20 years, with the 1948 Office 
of Strategic Services volume, Assessment of 
Men, lighting the way, there has been a steady 
if slow flow of research on the predictive va- 
lidity of clinical assessment, using multiple 
methods for obtaining information about indi- 
viduals. Taft (1959) provides a comprehen- 
sive review of the earlier studies. Studies by 
Bray and Grant (1966), Hilton, Bolin, 
Parker, Taylor, and Walker (1955), Camp- 
bell, Otis, Liske, and Prien (1962), Trankell 
(1959), Dicken and Black (1965), and 
Albrecht, Glaser, and Marks (1964) report 
significant positive correlations ‘between as- 
sessment predictions and performance criteria. 
The results of some studies, however, have 


1 The views expressed in this article are those of 
the authors and do not necessarily reflect an official 
position of the Central Intelligence Agency. 

2 Requests for reprints should be sent to Garland Y. 
DeNelsky, Central Intelligence Agency, Washington, 
D. C. 20505. 

3 Now at the Cleveland Clinic, Cleveland, Ohio. 


cast doubt upon the predictive efficacy of 
assessment procedures (Holtzman & Sells, 
1954; Kelly & Fiske, 1951). 

Bray and Grant (1966) summarized the 
research to date as follows: 
Though no firm conclusions regarding the predictive 
validities of multiple assessment procedures can be 
drawn from the rather mixed findings of published 
research, it does appear clear that the more accurate 
predictions were obtained where the performance to 
be predicted was clearly defined, the assessment re- 
sults did not restrict the range of subsequent criterion 
performance, and the criterion measures employed 
were not limited by low reliability and questionable 
validity [p. 2]. 


Unfortunately, it is usually impossible to 
meet the above conditions in applied assess- 
ment; the job duties are heterogeneous and 
ill defined; criterion performance is restricted 
in range by selection on the basis of assess- 
ment results; the criterion measure is based 
on standard organizational evaluation reports 
and, as such, is of questionable validity. A 
variety of raters and a variety of jobs, with 
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the clearly inept performers screened out, tend 
to lower the correlations of predictors and 
rated job performance. Many elements in a 
study of assessment au naturel coalesce to 
lower validity, and the question is whether 
assessment has value within these limitations 
and whether it can predict performance in an 
ongoing occupational setting. 

The purpose of the present study was to 
determine if predictive validity can be demon- 
strated for psychological assessments within 
a natural setting when a special rating tech- 
nique that increases predictor and criterion 
variability is used. The specific focus of in- 
vestigation was the assessment report; the 
major question was whether preemployment 
psychological assessment reports do predict 
the subsequent performance of those indi- 
viduals who are hired. 


MrEtTHOD 
Subjects 


Fitness reports (routine performance evaluations 
about one-half page in length) were obtained on 32 
male employees who had been working overseas for 
1 yr. or more. Assessment reports were available 
on all 32. These individuals had been assessed 12-57 
mo, earlier by one of eight psychologists; the median 
interval between assessment and fitness reports was 
20 mo. The original assessments varied slightly from 
case to case but typically included intellectual, per- 
sonality, attitudinal, and interest testing in addition 
to one or more depth interviews. The assessment 
reports were typically one or two pages long and 
contained descriptions of the individual’s strengths 
and weaknesses as well as a summary recommenda- 
tion. 

All 32 men were overseas at the time their fitness 
reports were prepared, Although it was not possible 
to determine how many different supervisors had 
actually been responsible for this group, it was estab- 
lished that none of the field supervisors had seen 
their assessment reports. The total of 32 men was 
divided into two groups. Each of these groups (which 
will be referred to as Group 1 and Group 2) con- 
tained 16 men. The two groups were judged sepa- 
rately; in fact, several months intervened between 
the judging of Group 1 and Group 2. 

Seven staff psychologists served as judges. All had 
experience in assessing overseas candidates, 


Procedure 


Trait prediction, In the first phase of the study 
for both groups, each of the judges was given the 16 
original assessment reports, together with a specially 
designed Trait Rating Sheet for each S. The Trait 
Rating Sheet listed 25 performance and personality 
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traits that had been abstracted from the narrative 
sections of the total group of fitness reports of the 
employees in the study. Performance ratings included 
such dimensions as response to supervision, accuracy 
of work, speed of learning, and supervisory effective- 
ness; personality ratings included such dimensions as 
judgment, maturity, flexibility, and self-confidence. 
Approximately half of the 25 dimensions could be 
described as personality variables; the other half 
pertained to job performance. The judges were 
instructed to form an impression of each of the men 
from the assessment report, and, on the basis of 
this impression, to predict whether each individual 
would be discussed favorably or unfavorably on 
each trait in his fitness report (assuming, of course, 
that he would be discussed on all dimensions—a 
slightly unrealistic situation since no employee was 
mentioned on more than 12 of the 25 dimensions). 
For those individuals mentioned favorably or un- 
favorably on a given dimension in their fitness 
reports, it was possible to determine if the predic- 
tions made by psychologists were in the same direc- 
tion as the actual descriptions of the individuals 
in their fitness reports. 

Q sorts of assessment and fitness reports. Following 
his completion of the Trait Rating Scales, each judge 
was asked to sort the assessment reports of the 16 
men of each group into five categories corresponding 
to his prediction of each individual’s overall effec- 
tiveness in a typical overseas work situation of the 
type to which these men were assigned. In order 
to eliminate variance due to differing frames of 
reference on the part of the seven judges, a modified 
Q-sort distribution was used; assessment reports were 
to be assigned to five categories, ranging from a pre- 
dicted worst performance to a predicted best per- 
formance with 1, 4, 6, 4, and 1 individuals assigned 
to the respective categories. Score values of 1, 2, 3, 4, 
and 5 (best) were assigned to the five categories, 

Following the Q sort of assessment reports on the 
basis of predicted overall effectiveness, each judge 
was assigned the task of Q sorting, in the same 
manner as before, each group of 16 individuals on 
the basis of actual overall effectiveness as described 
in narrative form in their fitness reports, The names 
of the 16 men were deleted from the fitness reports; 
thus the judges had no way of knowing which of the 
assessment reports and fitness reports had been 
written for the same persons, 

It should be noted that the prediction situation 
as structured in this study was different from the 
usual design of studies with similar objectives. 
Instead of being given test scores and other psycho- 
metric and background data and being required to 
weight this “raw” information in order to make 
predictions of future behavior, the judges in this 
study were asked to formulate predictions on the 
basis of finished assessment reports, Thus, the judges 
in the present study were placed in a role similar 
to the consumer of psychological assessment reports: 
They were to make predictions on the basis of some- 
one else’s analysis and interpretation of first-hand 
data. Dicken and Black (1965) used a similar method, 
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TABLE 1 


ANALYSIS OF VARIANCE RELIABILITY COEFFICIENTS 
FOR ASSESSMENT- AND FITNESS-REPORT RATINGS 








Coefficient for 
composite rating 


Coefficient for 
single rating 


Rating 
Group 1|Group 2/Group 1)Group 2 
Assessment report | .63 .66 92 93 
Fitness report 59 74 91 95 








commenting that “the ratings are thus two interpre- 
tive steps removed from the original test data 
[p. 36].” 


RESULTS 
Prediction of Overall Effectiveness 


Before relating assessment-report predictions 
to fitness-report ratings, it was necessary to 
establish the reliability of the judgments made 
by the judges on both measures. 

Table 1 presents the analysis of variance 
reliability coefficients for the assessment- and 
fitness-report judgments. It is evident from 
this table that the reliabilities, particularly 
of the average or composite ratings for each 
individual by all judges, are quite satisfac- 
tory. Despite several judges’ comments that 
the task of making the ratings was a difficult 
one, there was substantial agreement among 
judges on both the assessment-report and the 
fitness-report ratings. 

The answer to the primary question of this 
study—whether judges can predict, on the 
basis of psychological assessment reports, per- 
formance in actual field situations as judged 
from fitness-report narratives 12-57 mo. later 
—can be approached from a number of direc- 
tions. Perhaps the single most meaningful 
approach is to correlate the composite assess- 
ment-report predictions of the seven judges 
for each of the 16 individuals in each group 
with the composite judged effectiveness of 
the same individuals based on fitness reports. 
The resulting correlations, presented in Table 
2, indicate that with the total sample of 32 
men, there is a significant positive relationship 
between the overall or composite predictions 
of effectiveness based on assessment reports 
and actual effectiveness as judged from fitness 
reports. 
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TABLE 2 


CORRELATIONS BETWEEN Composite ASSESSMENT- 
REPORT PREDICTIONS AND FITNESS- 
Report EvALuations 











Group N r 
1 16 42 
2 16 20 
1 and 2 combined 32 he 


* bp <.05, one-tailed test. 


Another way of illustrating the relationship 
between assessment and fitness reports is 
shown in Table 3. Of those 17 men with 
average or above assessment ratings, 12 
(71%) received average or above fitness 
ratings, while only 6 (40%) of the 15 men 
with below-average assessment ratings re- 
ceived average or above fitness ratings. 

Table 4 presents correlations between the 
individual judge’s assessment ratings and the 
composite fitness ratings (for Groups 1 and 2 
combined). Assuming the composite of the 
fitness-report ratings by all judges is the best 
single measure of actual performance, the 
psychologists varied in their ability to predict 
performance from assessment reports; only 
three of the correlations were significant at 
the .05 level. 

The fitness reports used in this study re- 
quired the evaluator not only to give a nar- 
rative appraisal but to rate the overall per- 
formance of each of his subordinates on a 
5-step adjectival scale: weak, adequate, strong, 
proficient, outstanding. In this study, the 
adjectival ratings were not made available to 
the judges since it was thought that differ- 
ences in rating might reflect variations in 


TABLE 3 


PERFORMANCE AS A FUNCTION OF 
ASSESSMENT PREDICTION 





Performance evaluation 





Assessment 
prediction Average or 
Below average 
above 
Average or above* 71% 29% 
Below average 40% 60% 
aN =17, 
oN =15. 
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TABLE 4 


CORRELATIONS BETWEEN INDIVIDUAL ASSESSMENT- 
REPORT PREDICTIONS AND COMPOSITE 
FitNess-REPORT EVALUATIONS 





Correlations between individual 
ratings of assessment reports 
& composite (7 judges) 
fitness-report ratings 


Judge 


29 
5010) 
30* 
9 
.41* 
wes 
13 


MSIOMNPWYHY 





* pb <.05, one-tailed test. 


rating bias of raters more than variations in 
performance. Table 5 presents data indicating 
that the judges in this study evaluated the 
narrative section of the ratee’s fitness reports 
in the same direction as the overall letter 
ratings assigned to each man by his super- 
visor. Remembering that the larger the 
numerical rating an individual received the 
higher was his judged effectiveness, indi- 
viduals receiving overall “strong” ratings were 
judged more effective than those receiving 
overall “proficient” ratings (p< .07). The 
biserial correlation between the judged com- 
posite rating of effectiveness and the overall 
letter rating was .34. More important than 
the agreement of supervisors’ ratings of over- 


TABLE 5 


MerAN EFFECTIVENESS RATINGS FOR INDIVIDUALS 
RECEIVING STRONG AND PROFICIENT OVERALL 
Fitness-REPORT EVALUATIONS 





It Mean composite 
ae effectiveness rating* 
Individuals receiving overall 
strong fitness-report 


evaluations» 22:3 
Individuals receiving overall 

proficient fitness-report 

evaluations® 19.1 





Note.—An evaluation of ‘‘strong’’ was superior to ‘‘pro- 
ficient’’ in the fitness-reporting system. 
« As judged by seven psychologists from fitness-report nar- 
ratives only. 
N = 19, 
oN = 13, 
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all performance with the judges’ ratings based 
on the supervisor’s narrative evaluation is the 
fact that the judges’ ratings provide a greater 
range than is usually obtained with fitness 
reports in which the majority of supervisors 
generally restrict themselves to about two 
categories, as they did in this study where 
all the overall ratings were either proficient 
or strong. The high reliability of the 5-point 
ratings made by the psychologists suggests 
that a greater range of performance among 
personnel is recognized by supervisors than 
is typically reflected in their overall ratings 
in fitness reports. 


Trait Prediction 


In this portion of the study, the seven 
psychologists, on the basis of assessment re- 
ports only, rated all 32 employees on 25 traits 
or dimensions that had been abstracted from 
the fitness reports of the total group of indi- 
viduals. Using the specially designed Trait 
Rating Sheet, judges predicted whether each 
individual would be discussed favorably or 
unfavorably on each dimension in his fitness 
report, assuming that he would be discussed 
on all dimensions, 

A major difficulty with these data arose be- 
cause 88% of the 188 statements abstracted 
from the fitness reports of the 16 individuals 
were favorable. Similarly, 74% of the total 
number of predictions made by the judges 
were positive. These high-positive base rates 
insured a great deal of agreement between 
predictions based on assessment reports and 
statements drawn from fitness reports. In fact, 
74% of the total group of over 1,300 predic- 
tions made by the seven psychologists were 
“correct,” that is, in agreement with the 
fitness-report narratives. Given the high rate 
of positive statements in fitness reports and 
the nearly as high rate of positive predictions 
made from assessment reports, were the psy- 
chologists able to make a significant improve- 
ment over the base rates in their prediction 
of these specific dimensions of performance? 

One way of answering this question is pre- 
sented in Table 6. If psychologists are able 
to predict specific dimensions of performance 
to a degree exceeding that which would be 
expected by base rates alone, then their pre- 
dictions for those individuals described posi- 
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tively in fitness reports on a specific dimen- 
sion should exceed the overall (or base rate) 
prediction for all persons on that dimension. 
Since for most dimensions the distribution 
of the psychologists’ predictions was skewed, 
the median rather than the mean percentage 
of psychologists’ predictions of favorable 
fitness-report descriptions on a given dimen- 
sion was taken as the base rate for that 
dimension. For example, if 85% of the judges 
predicted that a certain individual would be 
described favorably on a given dimension and 
in fact he was described favorably in his 
fitness report on this dimension, this would 
constitute a successful prediction if the 
median percentages of judges rating all indi- 
viduals positively on that dimension was 71. 
If, however, only 57% of the judges pre- 
dicted that this person would receive favor- 
able mention on this dimension, this would 
be classified as an unsuccessful prediction 
since it is below the 71% base rate. But if 
this person’s fitness report had made an 
unfavorable comment about his initiative and 
resourcefulness, the first prediction (where 
85% of the judges predicted a favorable de- 
scription) would have been classified as un- 
successful since it was above the base rate 
while the second prediction would be success- 
ful (since only 57% of the judges predicted 
a favorable description of this dimension as 
compared with a base rate of 71%). This is 
a rather rigorous test, for it assumes that 
people mentioned favorably in their fitness 
reports on a specific dimension are actually 
stronger, and the people mentioned unfavor- 
ably, weaker on that dimension than people 
not mentioned one way or the other. The 
typical fitness report, of course, does not pro- 
vide a comprehensive or systematic picture 
of a person’s strengths or weaknesses. 

Table 6 shows that for 83 ef the total 
group of 150 positive statements drawn from 
fitness reports, the group of seven psycholo- 
gists made predictions on the corresponding 
dimensions that were more in the correct (or 
favorable) direction than the average of the 
total group of predictions made on these 
dimensions. Similarly, for the 21 negative 
statements drawn from the fitness reports, 
the psychologists made 16 correct predictions 
on the corresponding dimensions. Thus, for a 
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TABLE 6 


NUMBER OF SUCCESSFUL AND UNSUCCESSFUL PRE- 
DICTIONS Mapr ON SPECIFIC PERFORMANCE 
AND PERSONALITY DIMENSIONS DeE- 
SCRIBED IN FITNESS REPORTS 





Successful Unsuccessful 








Dora predictions predictions Total 
Positive 83 67 150 
Negative 16 6 21 

Total 99% 72 171 





Note.—‘‘Successful’’ and ‘‘unsuccessful’’ were defined in 
terms of base rates; a successful prediction for an individual on 
a given dimension was recorded when the percentage of judges 
rating that individual in the same direction as the fitness 
report’s narrative exceeded the median percentage of the judges 
rating all individuals on that dimension. (See the text for a 
complete description of this method.) 

*p <.02 that this split is significantly different from a 
.50 :.50 split. 


combined total of 99 of 171 predictions, the 
psychologists achieved more accurate predic. 
tions than would have been expected through 
base rates alone. A binomial test indicates 
that this ratio of successful to unsuccessful 
predictions exceeds a .50:.50 (chance) split 
at the .02 level. (Seventeen positive state- 
ments drawn from fitness reports could not 
be classified as successful or unsuccessful 
predictions since the percentage of psycholo- 
gists predicting a favorable fitness-report de- 
scription fell at the median for all Ss on 
those dimensions. ) 

Because of the relatively few individuals 
discussed on each of the various dimensions of 
the Trait Rating Scale in the fitness reports 
(no more than 20 of 32 individuals were cited 
on any single dimension), it is not possible 
to compare the relative predictive effective- 
ness of the group of psychologists on different 
dimensions. However, there is evidence that 
the psychologists in this study were better 
able to predict weaknesses than strengths. On 
positive dimensions, 55% of the psychologists’ 
predictions were successful (i.e., better than 
the base rates). On negative dimensions, 76% 
of their predictions were successful. The dif- 
ference between these proportions was signifi- 
cant at the .05 level. 


DIscussION 


On the basis of this study, it is reasonable 
to conclude that psychologists can predict 
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significantly better than chance both overall 
competence and specific performance and per- 
sonality characteristics of employees using 
only completed assessment reports prepared 
1-4 yr. earlier. 

The modest relationships that emerged for 
the prediction of overall as well as specific 
dimensions of effectiveness are probably arti- 
ficially low, since the least promising indi- 
viduals were not employed at all. This type 
of restriction of range is unavoidable in 
studies of this nature. Had it been possible 
to gather feedback data on all individuals 
assessed, it is likely that the predictive ef- 
fectiveness of the psychologists would have 
been enhanced. 

It was found that the pooled judgments 
of several judges yielded greater predictive 
accuracy than the judgments of individual 
psychologists. Only one of the seven judges 
was able to exceed the predictive accuracy of 
the composite judgments. As Kelley and 
Thibaut (1954) point out, pooling indepen- 
dent judgments should always enhance valid- 
ity except in the situation where the judg- 
ments of the average individual correlate zero 
with the criterion. 

The finding that psychologists were able to 
predict specific performance dimensions and 
personality characteristics better than the 
base rate was encouraging. It should be re- 
membered that these predictions were made 
on the basis of secondary information; that is, 
the psychologists who made the predictions 
used assessment reports that were not formu- 
lated specifically toward making predictions 
on these dimensions. Therefore, the psycholo- 
gists in this study were forced to “read be- 
tween the lines” to make predictions on most 
of the dimensions for most of the employees. 
Higher predictive accuracy could be expected 
if the psychologists who made the predictions 
conducted the initial assessments with these 
dimensions in mind. 

The finding that psychologists were better 
able to predict weaknesses than strengths is 
provocative. If substantiated by further re- 
search, it has interesting implications for the 
assessment process. 

That psychologists can reliably generate 
5-point evaluations of fitness reports that 
originally fell in only two categories is note- 
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worthy. One of the difficulties in using many 
standard fitness reports or appraisal ratings 
as criteria of job performance is their limited 
variance. The results of this study indicate 
that job-performance variance can be mean- 
ingfully expanded through a modified Q sort 
that forces reviewers of these reports to make 
more discriminations among individuals. 

Finally, studies similar to the present one 
should be conducted with persons other than 
psychologists making predictions on the basis 
of assessment reports. This would be more 
nearly analogous to the situation at present 
where the psychologist, through his assessment 
report, supplies a consultative function to 
another individual (or group of individuals) 
who combines this report with other informa- 
tion in order to arrive at a selection decision. 
Implicit in this decision is the prediction of 
how well a given individual will “work out,” 
or even whether he will “work out” at all. 
In the last analysis, these predictions made 
by the persons who typically select or reject 
are the most meaningful ones, and hence 
should be the focus of systematic study. 

Meanwhile, this study does provide reas- 
surance that the assessment process can result 
in meaningful predictions of job behavior as 
evaluated from fitness reports. 
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Personality correlates of intrinsic job orientation (IO) and extrinsic job orien- 
tation (EO) are studied using a sample of 136 employees in an organization 
that provides social services. The results are presented for general intrinsic 
orientation as well as for specific factors included in the two broad categories. 
The results tend to indicate that concern with intrinsic factors signifies approach 
tendencies, while concern with extrinsic factors points to avoidance tendencies. 


Classifying job factors into intrinsic and 
extrinsic categories has been emphasized ever 
since Herzberg, Mausner, and Snyderman 
(1959) published their book The Motivation 
to Work. Intrinsic factors are defined as those 
directly related to the actual performance of 
the job (i.e., achievement, responsibility, 
nature of work, etc.), while extrinsic factors 
are defined as those related to the environ- 
ment in which the job is being performed 
(i.e., company policy, working conditions, 
interpersonal relationships, security, etc.). 

Most of the recent work in this area con- 
centrated on supporting or refuting the propo- 
sition that these two categories are two di- 
mensions of job attitude (Burke, 1966a; 
Ewen, Smith, Hulin, & Locke, 1966; 
Graen, 1966; Herzberg, 1965; Myers, 1964; 
Schwartz, Jenusaitis, & Stark, 1963). A few 
studies, however, related the two categories 
to job level (Gurin, Vernoff, & Feld, 1960; 
Porter, 1962, 1964), age (Saleh, 1964), and 
sex (Burke, 1966b; Centers & Bugental, 
1966; Saleh & Lalljee, in press). In the area 
of mental health, Hamlin and Nemo (1962) 
in a sample of schizophrenics found that 
“motivation seekers” or the intrinsically ori- 
ented improved more than the “hygiene 
seekers” or the extrinsically oriented. 

The authors are not aware of any studies 
that directly investigated the relationship be- 
tween the intrinsic—-extrinsic dichotomy and 
personality variables. An analysis of such 
relationships should provide more insight into 
the nature of these categories. The present 
study is an attempt to fill part of this gap 


1 Requests for reprints should be sent to S. D. 
Saleh, Department of Management Sciences, Uni- 
versity of Waterloo, Waterloo, Ontario, Canada. 


by examining the personality correlates of the 
intrinsic orientations (IO) and the extrinsic 
orientations (EO), and of each specific factor 
included in the two broad categories. 


METHOD 


The study was conducted in an organization whose 
primary function is to provide correctional and social 
services to children and adolescents. The sample con- 
sisted of 136 of the technical staff, all on the same 
job level, who did not have formal education beyond 
high school. The mean age of the group was 36.4 
with a standard deviation of 8.4. Only 14 Ss were 
female. Two scales were administered: the Job Atti- 
tude Scale (JAS; Saleh, 1964) and the Likes and 
Interests Test (LIT; Grygier, 1956). 

The JAS consists of 16 job-related statements, 
each paired with each of the other 15 in a forced- 
choice format. Six of the statements present intrinsic 
factors: achievement, recognition, responsibility, 
nature of work, advancement, growth in skill. The 
other 10 present extrinsic factors: working condi- 
tions, company policy, salary, security, status, tech- 
nical supervison, and salary needs for family’s sake, 
and interpersonal interactions with supervisor, sub- 
ordinates, and equals. The overall job orientation 
was secured by scoring only those items where an 
intrinsic factor was paired with an extrinsic one 
(60 items). By giving 1 point whenever the intrinsic 
factor is checked, the possible score range is 0-60. 
The range in the present study was 5-55. The 
internal consistency (split half) of this scale was 
.94. The JAS also provided a score for each factor 
by using the 15 items in which the factor was 
paired with the other 15, and the scores of these 
factors range from 0 to 15. The means and standard 
deviations of all dimensions are presented in Table 1. 

The LIT is a slightly shorter version of the 
Dynamic Personality Inventory (Grygier, 1960), 
which is based on the psychoanalytic theory and 
started as a modification -of the Krout-Tabin Per- 
sonal Preference Scale (Krout & Krout, 1951, 1954). 
It ran over a dozen experimental editions: Succeed- 
ing editions of the test were factorized, examined 
for internal consistency and repeat reliability, and 
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TABLE 1 


MEANS AND STANDARD DEVIATIONS OF THE 
JAS Drtensions 


Dimension M SD 
IO 30.2 8.9 
Family needs (salary) 11.8 3.0 
Achievement 9.2 2.9 
Growth in skill 8.9 3.3 
Security 8.7 3.9) 
Relationships with peers 8.6 2 
Relationships with subordinates 8.1 2.9 
Advancement 7.9 3.3 
Responsibility Hel 3.3 
Supervision 7.0 33 
Working conditions 7.0 2.9 
Salary 6.8 3.9 
Relationship with supervisor 6.5 2.6 
Creative work 6.4 Sn 
Recognition 5.6 3.6 
Personnel policies 4.7 2.6 
Prestige 4.5 3.4 

Note.—Abbreviations: JAS =Job Attitude Scale; IO 

= intrinsic job orientation. 
validated and cross-validated (Grygier, 1956). It 


consists of 304 items and has 30 scales. 


These scales are 


H Hypocrisy: self-satisfaction with own moral 

standards, lack of insight into own limitations, 

social conformity. : 

Passivity: liking for comfort, warmth, and mild 

sensual impressions. 

Ws _ Seclusion and introspection and their use as a de- 
fense mechanism against social anxiety. 

OA Oral aggression: pleasure in biting and crunching, 
liking for strong and savory foods, suggestion of 
free-floating aggression and anxiety about its 


Wp 


control. 
Od Oral defense: need for guidance and reassurance, 
clinging attitude. 
Om _ Need for freedom of movement and for emotional 
independence. 


Oy Verbal aggression: verbally and/or intellectually 
aggressive and self-assertive behavior. 

Oi Impulsiveness, changeability, spontaneity, speed 
of reaction, emotional expressiveness, generosity, 
and extravagance. 

Ou Unconventionality of outlook, originality, and 
individuality. 

Ah Hoarding behavior, anxious possessiveness, and 
stubborn, clinging persistence. 

Ad Attention to details: orderliness, conscientious- 
ness, and perfectionism. 

Ac Conservatism, rigidity, tendency to stick to 

routine. 

Submissiveness to authority and order. 

Anal sadism: emphasis on strong authority, cruel 

laws, and discipline. 


ae 
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Ai Insularity: reserve and mistrust, social and racial 
prejudice. 

Pn Narcissism: concern with clothes and appearance; 
sensuous enjoyment of luxury. 

Pe Exhibitionism: conscious enjoyment of attention 
and admiration 

Pa Active Icarus complex; psychophysical drive, 
drive for achievement. 

Ph Fascination by height, space, and distance (passive 
Icarus complex) : aspirations at the fantasy level. 

Pf Fascination by fire, wind, storms, and explosions 
(sensual aspects of the Icarus complex): per- 
ceptiveness of sensual impressions, vivid imagina- 
tion. 

Pi Icarian exploits: interest in active, pioneering ex- 
ploration and a liking for adventure. 

TI Enjoyment of tactile impressions: interest in 
handicrafts and creative manipulation of objects. 

CI Creative, intellectual, and artistic interests. 

M Masculine sexual identification and tendency to 
adopt masculine social roles, interests, and 
attitudes. 

F Feminine sexual identification and a tendency to 
adopt feminine social roles, interests, and at- 
titudes. 

Tendency to seek social roles (irrespective of their 

masculine or feminine characteristics). 

SA Interest in social activities. 

C Interest in children, need to give affection. 

EP Ego defensive persistence: tendency to react with 
renewed effort in the face of difficulties or op- 
position. 

EI Initiative, self-reliance, and a tendency to plan, 
manage, and organize. 


MF 


The general job orientation scale (10) and all 
the subscales of the JAS were correlated with the 
30 scales of the LIT. 


RESULTS AND DISCUSSION 


Table 2 shows that the intrinsically ori- 
ented Ss scored low on two of the personality 
variables (Od and Ac), which means that 
they were relatively more independent, confi- 
dent, and flexible. They did not feel that 
they needed much guidance or reassurance 
(Od), and they rejected stability, routine, 
and conventional standards (Ac). 

Moreover, the IO indicates a tendency to 
counteract with renewed effort in the face of 
difficulties or opposition, “doubling efforts 
after criticism,” “sticking at a job when no 
results are forthcoming,” “concentrating on 
one task for a long time” (EP). Initiative, 
self-reliance, and leadership (EI) are also 
some of their qualities. They have a strong 
tendency to be independent and to seek 
freedom of movement (OM). They would, 
for instance, “take risks,” “start out on 
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TABLE 2 


SIGNIFICANT CORRELATION BETWEEN 





Dimension H Wp | OA | Od 
Intrinsic orientation — .20 
Achievement 30 18 
Creative work 
Advancement — 28 
Responsibility 
Growth in skill 
Security —.21 30 
Salary —.37 
Family needs (salary) | —.31]| .20 


Relationship with sub- 
ordinates 

Relationship with 
supervisor 

Personnel policies 

Prestige 


Note.—N = 


22 — 20 
17 eae 
.28 
mLo) 
Lie sls 
23 20 40} .19 ead 
.23 
—"13)|/ —.22 —.18 | —.23 
— 25 oT 
—"23 





136. A correlation of .17 is significant at the .05 level, and a correlation of .23 is significant at the .01 level. 


Abbreviations: JAS = Job Attitude Scale; LIT = Likes and Interests Test. See text for scale abbreviations. 


new ventures,” 
attachments.” 

As would be expected in a group whose 
primary function is to provide social service, 
the IO in our sample expressed a tendency 
to seek social roles (MF), especially mascu- 
line roles (M), and to be more interested in 
social activities (SA) than the EO. 

The results also show that the IO tend 
consciously to enjoy attention and admiration 
and desire to seek prominence (Pe). For in- 
stance, they would enjoy “being a Master of 
Ceremonies,” sitting in the front row at a 
meeting,” and ‘“‘appearing on the stage.” 

The last significant correlation (PL) indi- 
cates fascination by height, space, and dis- 
tance and is related to the flow of ideas and 
imagination of IO Ss. 

The correlations in Table 2 show that 
“creative work” is the dimension most similar 
to the IO dimension. The Ss who were con- 
cerned with the nature of their job shared the 
following characteristics with the intrinsically 
oriented. Both dislike rigidity of approach 
(AC), have a need for emotional independ- 
ence (OM), are imaginative (PH), have a 
strong nAchievement (Pa), and both seek so- 
cial roles (MF), especially masculine ones 
(M). The M scale has such items as “drawing 
up plans introducing new ideas” and “making 
new gadgets and mechanical devices.” 


and “have no permanent 


The results suggest that the psychodynam- 
ics of advancement and of responsibility are 
similar. Both are characterized by self- 
assertion (Ov), drive for achievement (Pa), 
and a tendency to seek masculine social roles 
(M). However, concern with advancement 
correlates negatively with the tendencies to 
seek to give affection (Od, C), while respon- 
sibility is characterized by assuming leader- 
ship roles (EI). It is of interest to note that 
three of the four correlations that describe 
responsibility (Pa, M, EI) describe the pres- 
tige dimension. The first two correlations are 
also shared with advancement. This similarity 
suggests considering prestige an _ intrinsic 
factor rather than an extrinsic one. 

In contrast with the need for advancement 
and responsibility, emphasis on relationships 
with subordinates is negatively related to 
expression of one’s own individuality and 
self-assertion (Ou), to the drive for achieve- 
ment (Pa), and to initiative and self-reliance 
(EI). Also the negative correlation with the 
Pn scale suggests a denial of self-concern and 
of narcissistic needs. 

The salary dimension as well as the dimen- 
sion of family need. for money seem to be, 
in general, negatively related to different kinds 
of activity (H, MF, SA, EI, TI), while they 
are positively related to passive enjoyment 
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13: |=. 20 13 
—.20 —.22 — .20 
18 18 
18 alo 18 
of comfort, generosity, and extravagance different population samples, are needed to 


(Wp, Oi). 

The security dimension appears to be quite 
meaningful psychologically since it is corre- 
lated with the largest number of the LIT 
scales. It is of interest to note that all of the 
scales that describe the IO, with only one 
exception (FM), are correlated with an op- 
posite sign with security. Those positively 
correlated with IO are negatively correlated 
with security and vice versa. 

One general observation about the results 
is that the majority of correlations with the 
intrinsic factors were positive, while with the 
extrinsic factors the majority were negative. 
It should be emphasized that the difference 
is not only mathematical but also is psycho- 
logically meaningful. This seems to indicate 
that concern with intrinsic factors signifies 
approach tendencies while the regard for 
extrinsic factors is characterized by avoidance 
tendencies. 

In this regard the results support Herzberg’s 
notion of considering the “motivators” differ- 
ent in nature from the “hygienes,” and they 
also show the meaningfulness of differenti- 
ating between the two dimensions. 

In conclusion it should be pointed out that 
although the present study has presented some 
meaningful results, more investigations, on 


make any adequate generalization. 
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COMPARISON OF SEVERAL PATTERNS OF COMMUNICATION 
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Four patterns of communication were compared with respect to the amount 
of material recalled. Pattern 1 consisted of a “receiver” listening to one “trans- 
mitter.” Pattern 2 was similar, except that the “receiver” listened to two 
“transmitters” speaking simultaneously. Pattern 3 required the “receiver” to 
talk while listening to one “transmitter.” Pattern 4 had the receiver talk at 
the same time that he listened to two simultaneous “transmitters.” Statistically 
significant results were obtained favoring the “receiver”? who listened to one 
rather than two transmitters (even though in the case of two “transmitters” 
twice the number of different facts were heard) and listened rather than 
talked and listened. A significant interaction also occurred. Information 
retained by the “receiver” from each of the two simultaneous “transmitters” 


was compared. 


In studying the interrelation between indi- 
viduals in any social system the general area 
of communication is an important concern. 
Dealing with communication becomes very 
critical in the study of small collectivities or 
groupings in which success of goal attainment 
is directly proportional to efficiency of infor- 
mation transmission, receival, and retention. 

Casual observation very readily points out 
that in what is referred to as ‘everyday inter- 
action” there exists a very definite pattern of 
“communicative acts.” This patterning will 
be referred to as the “normal pattern.” The 
normal pattern is one in which the communi- 
cative acts are sequential in nature with lit- 
tle or no superimposition. If one person is 
speaking and another starts and persists, the 
first individual stops. Thus there is a dichoto- 
mous positioning of speaker and listener. 

In terms of efficiency, this patterning would 
appear to be less than optimal. Yet while 
Miller (1965, p. 95) points out that “there is 
no a priori reason why two people . . . could 
not question and answer simultaneously,” a 
search of the literature of different patterns 
of communication did not yield any research. 
However, some related studies have been 
conducted where Ss were presented with over- 
lapping messages to more than one sense or- 
gan. Also related are studies that deal with 
selective perception examining the conditions 
under which more attention is paid to one 


1 Requests for reprints should be sent to Morton 
Goldman, Psychology Department, University of 
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message than another, where both messages 
are heard simultaneously. Studies in these 
areas have been reviewed by Horn (1965) 
and by Broadbent (1958). 

Many variations of verbal communication 
other than the normal pattern are possible. 
An individual can listen to several people 
speaking simultaneously; and an individual 
can speak at the same time that he is listen- 
ing to others. In an initial study, it was de- 
cided to study these two factors using a 2 * 2 
design. The receiving Ss (“receivers”) hear 
statements from either one individual or two 
individuals speaking simultaneously and each 
presenting different facts (‘‘transmitters”’). 
Under each of these two conditions the re- 
ceiver will listen only, or will be talking at 
the same time that he is listening to the 
transmitters impart their facts. Hence one 
factor of the current study deals with listen- 
ing to different numbers of transmitters (one 
or two), and the other factor deals with the 
behavior of the receiver (listening only, or 
talking and listening). In all treatments the 
dependent variable will be the number of 
facts the receiver can recall. 

Related to the talking and listening pro- 
cedures to be examined in the current study is 
a technique called the shadowing method, 
which has been employed in studying selective 
listening. The shadowing method, first de- 
scribed by Cherry (1953), has an S$ repeat 
aloud a spoken prose message as it is heard 
with as short an interval as possible between 
the spoken and shadow message. However, in 
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the current paper, Ss were asked to say aloud 
a message differing from the one to which they 
were listening. 

Some of the results of the current study 
might appear obvious. For example, it might 
be anticipated that the amount of information 
that the receiver can recall would be greater 
when he devotes all of his attention to listen- 
ing than if he diverts his attention by talking 
and listening at the same time. However, in 
the treatment where the receiver is listening 
simultaneously to two transmitters, each pre- 
senting different material, the number of facts 
presented in a given time period is twice the 
number presented to the receiver who is in 
the treatment where there is only one trans- 
mitter. How would the amount of information 
recalled by the receivers compare for these 
two treatments? The results by no means are 
obvious. Further the design to be used allows 
for an answer to the question of when the 
receiver hears two transmitters, whether he 
pays equal attention to both, or disregards 
one transmitter in favor of the other. 

Since the communication patterns to be 
used in this study generally can be assumed 
to be unusual for most individuals, it was 
decided to give Ss five trials to see if with 
repeated practice results would change. 


METHOD 


Subjects. The Ss used in the study were recruited 
from the General Psychology course at the Univer- 
sity of Missouri, Kansas City. Ten Ss were ran- 
domly assigned to each of the four treatments mak- 
ing a total of 40 Ss. 

Treatments. Four treatments described above were 
devised: (a) the NT-Li treatment consisted of hav- 
ing an S (not talk) act as a receiver and listen to 


TABLE 1 


MEAN RETENTION FOR TRIALS OF EACH TREATMENT, 
TTRIALS OVER ALL TREATMENTS, AND 
TREATMENTS OVER ALL TRIALS 








Tnats Treatments 

Treatments pusraae for 
1 2 3 4 5 trials 
NT-Li 8.1 8.8 8.3 6.8 hep 7.59 
NT-L2 6.8 | 6.0 8.0 6.8 8.2 7.16 
T-Li Sel 5.0 4.5 5.0 4,2 4.84 
T-L2 3.4 3.7 Seo pasa 2.6 3.14 

Trials over all 
treatments 5.85 | 5.88 | 6.02 | 5.32 | 5.62 
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one person, a transmitter, impart information; (b) 
the NT-Lez treatment was similar to the above, ex- 
cept that the receiver listened to two transmitters 
speak simultaneously, each imparting different infor- 
mation; (c) the T-Li treatment required the re- 
ceiver to talk at the same time he was listening to 
one transmitter—the message spoken by the receiver 
was different than the message spoken by the trans- 
mitter; (d) the T-Le treatment required the re- 
ceiver to talk while he listened to two transmitters 
simultaneously talking, each message being different. 
Thus, in the T-Li treatment two people are speak- 
ing at the same moment, one receiver and one trans- 
mitter; in the T-Lz treatment three people are 
speaking at the very same moment, one receiver and 
two transmitters. In all the treatments, the receiver 
was given five trials, each trial consisting of new 
transmitters imparting different information. The 
transmitters consisted of college students who were 
not acquainted with Ss serving as receivers. 

Procedure. Each transmitter was asked to com- 
plete a form requesting the following 10 items: (a) 
name, (6) favorite sport, (c) college major, (d) 
advisor’s name, (e) mother’s first name, (f) father’s 
first name, (g) father’s occupation, (hk) possible 
future occupation, (z) religious affiliation, and (j) 
place of birth. After being placed in a different ran- 
domized order for each S these 10 items served as 
the information which each transmitter was to re- 
cite to the receivers. To facilitate reciting the items, 
the transmitters were given a short period of time 
to familiarize themselves with the order of the items. 
The information forms were kept in front of them 
during the transmission. This procedure was also 
followed for the receivers in the T-lLi and T-L2 
treatments, where they had to recite information. 
The Ss all spoke at approximately the same speed 
(the speed of normal conversation) and generally 
took a similar amount of time to impart the items 
of information. In all four treatments the receivers, 
upon the completion of each trial, immediately re- 
corded the information they were able to retain. 
This retained information, recorded by the receivers, 
served as the dependent variable. 


RESULTS 


In each of the four treatments the trans- 
mitters imparted 10 factual statements about 
themselves. At the end of each trial the re- 
ceivers were asked to write down as many 
facts as they could recall. One point was al- 
lotted to a receiver for each correct recorded 
fact. Table 1 shows the means for the 10 
receivers in each of the four treatments over 
the five trials. 

The results were analyzed with a 2 X 2 
trend analysis of variance. For the factor com- 
paring the amount of information retained by 
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the receiver when he listened to one transmit- 
ter as opposed to two transmitters giving 
simultaneous information, an Ff = 8.02 (df 
= 1/36, p < .01) was obtained, favoring lis- 
tening to one transmitter. This result occurred 
even though the receiver, when he listened to 
one transmitter, heard only 10 facts and thus 
could obtain a maximum of 10 points; while 
the receiver, when he listened to two simul- 
taneous transmitters, heard 20 facts and 
could attain a maximum of 20 points. As 
would be anticipated, for the factor compar- 
ing retained information when the receiver 
listened only as opposed to talking while 
listening, an F= 6548 (df=1/36, p< 
001) was obtained, favoring listening only. 
For the interaction of the above two factors 
an F = 11.18 (df = 1/36, p< .01) was ob- 
tained which reflected that Ss in the T-Le 
treatment recalled fewer facts in comparison 
to the T-L; treatment than did Ss in the 
NT-Le treatment in comparison to the NT-L 
treatment. Stated another way, a greater drop 
in recall occurred for Ss listening to one 
transmitter when Ss themselves were also 
talking rather than only listening. 

No discernible pattern could be detected for 
any of the treatments over the five trials, the 
F for the trial source of variance being less 
than 1. The Fs obtained for the interactions 
of the trials and the other factors were non- 
significant, 

In the two treatments, NT-Ly and T-Le, 
where Ss are listening to two transmitters si- 
multaneously, the question can be raised as to 
whether the receivers recall approximately the 
same amount of material from each trans- 
mitter or if Ss tend to favor one of the two 
transmitters and slight the other. Further, an 
additional question can be asked—if this 
favoring process occurs differentially in the 
T-L2 treatment as compared to» the NT-L»2 
treatment. To obtain evidence bearing on 
these two questions, for each receiver (com- 
bining all trials in the NT-L2 and T-Le treat- 
ments), the transmitters were divided into 
two groups in the following manner. Of the 
two transmitters speaking simultaneously to 
a given receiver, the transmitter from whom 
that given receiver recalled more information 
was called the high-recall transmitter, and 
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TABLE 2 
MEAN RECEIVERS RECALL ScorEs FOR HiGH AND Low 
TRANSMITTERS IN THE NT-Le AND T-Le 
‘TREATMENTS COMBINING THE 
Five TRIALS 








Receivers Score 
NT-L2 

High recall 22.9 

Low recall 12.3 
T-Le 

High recall ie 

Low recall 3.4 


the transmitter from whom the receiver re- 
called less information was called the low- 
recall transmitter. The amount of information 
recalled by the receivers from the high- and 
low-recall transmitters are presented in Table 
ae 

In the NT-Ls treatment the receivers ob- 
tain approximately 2/3 of their score from the 
high-recall transmitter and 1/3 from the low- 
recall transmitter. In the T-L» treatment, the 
receivers obtain approximately 4/5 of their 
score from the high-recall transmitter and 1/5 
from the low-recall transmitter. Comparing 
the high-recall mean with the low-recall mean 
in each of the two treatments, significant t’s 
were obtained for both treatments (p< 
001). Comparing the mean percentage dif- 
ference between the high- and low-recall score 
in the NT-Le treatment with mean percentage 
difference between the high- and low-recall 
score in the T-Ly treatment, a significant ¢ 
was also obtained (p< .01). Thus, when a 
receiver is listening to two transmitters, he 
recalls significantly more information from 
one than from the other. He does not recall 
approximately equal amounts of information 
from each transmitter. Further, the percent- 
age recall difference between the two trans- 
mitters is further increased if the receiver is 
also talking rather than only listening. 


DISCUSSION 


In the current study it was found that the 
amount of information recalled when listening 
to one transmitter is significantly greater 
than when listening to two transmitters speak- 
ing simultaneously. This result occurred in 
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spite of the fact that when two transmitters 
spoke simultaneously, 20 facts were being 
presented, as compared to 10 facts when only 
one person spoke. When the receivers were 
listening to one transmitter, they were able to 
recall approximately 75% of the material, and 
thus even in this condition were unable to 
process all the information. In the situation 
when two transmitters were used, the receivers 
were listening to twice the amount of ma- 
terial, but in the same unit of time. The re- 
sults showed that further overloading the 
receivers past the point of maximum operat- 
ing ability leads to a decline in absolute effi- 
ciency. 

Another finding of the present study is that 
speaking at the same time that one is listen- 
ing significantly reduces the amount of ma- 
terial that can be recalled over not speaking 
and listening. Since in this study, the talking 
receivers were functioning at slightly less than 
50% efficiency in recall of the material pre- 
sented to them, it would be anticipated that 
there would be an even larger relative deteri- 
oration than for nontalking receivers, when 
two transmitters were used. This proved to 
be the case as supported by the obtained sig- 
nificant interaction. 

In the NT-L2 treatment, each receiver 
heard 20 facts in a given unit of time as 
compared to 10 facts for the receivers in the 
NT-L; treatment. When a new trial was pre- 
sented, different transmitters were used which 
gave new facts. It was found that with re- 
peated trials, at least for five repetitions, the 
results within chance fluctuations did not 
change. It would have been possible in two 
ways to arrange the conditions of the experi- 
ment so that the number of facts per unit of 
time would have remained the same for the 
NT-L, and NT-Le treatments: (@) the speak- 
ing speed of the transmitters in the NT-Le 
treatment could be made twice as long; or 
(6) the speaking speed of the transmitters 
could remain the same but the transmitters 
in the NT-Le treatment could have repeated 
the same information for two trials. The com- 
parison of the amount of recall of the receiv- 
ers in the NT-Ly and NT-L, treatments un- 
der these two conditions could be explored in 
further research. This same comparison could 
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also be made for the T-L; and T-Le treat- 
ments. 

The results of the study showed that when 
one individual is listening to two simultane- 
ous speakers, he does not pay the same atten- 
tion to each. Further, if the listener is himself 
talking, an even greater proportion of his at- 
tention is given to one of the two other talk- 
ing individuals. Would this result have oc- 
curred to an even larger extent if NT-Ls or 
T-L; treatments had been used? Still an ad- 
ditional area of further research would be to 
investigate the factors which determine to 
which of the two speakers a given listener 
will devote more of his attention. 

Reference above has been made to the 
shadowing procedure used by Cherry (1953). 
Cherry had his Ss shadow a voice presented 
to one ear while a different unrelated message 
appeared at the other ear. The results showed 
that Ss could attend to the required message 
while ignoring the unrelated message. A 
modification of shadowing occurs when a 
translator converts a message from one lan- 
guage into another at the same time he is 
listening to the message to be translated. In 
this case, the simultaneous translation or 
shadow message is not a duplication of the 
same sounds and words. Treisman (1965) has 
been concerned with comparing shadowing 
and simultaneous translation. The procedure 
used in the T-L»y treatment reported here can 
be thought of as still a further modification 
of shadowing. The Ss in this treatment were 
required simultaneously to attend to two dif- 
ferent messages while at the same time saying 
aloud a third message, all three messages con- 
taining the same factual categories of infor- 
mation but different content, where the in- 
formation spoken by S pertains to himself. 
Thus the current study compared the amount 
of recall of Ss who were exposed to several 
conditions of simultaneous perception. More 
information was retained by Ss in the NT-L; 
treatment than by Ss in the NT-Ls, 
T-L;, and T-L2 treatments. Nevertheless, Ss 
could recall facts from two different messages 
simultaneously presented while responding 
aloud with a related but unique third mes- 
sage. 


SEVERAL PATTERNS OF COMMUNICATION 455 


REFERENCES Miter, G. A. Speaking in general. In I. D. Steiner 

Broappent, D. E. Perception and communication. & M. Fishbein (Eds.), Current studies in social 

New York: Pergamon Press, 1958. psychology. New York: Holt, Rhinehart and Win- 
Cuerry, E. C. Some experiments on the recognition ston, 1965. 


of speech, with one and two ears. Journal of ‘Treisman, A. M. The effects of redundancy and 
Acoustical Society of America, 1953, 25, 975-979. ees : : 
f - . familiarity on translating and repeating back a 

Horn, G. Physiological and psychological aspects of : : as 

selective perception. In D. S. Lehrman, R. A. foreign and a native language. British Journal of 

Hinde, & E. Shaw (Eds.), Advances in the study Psychology, 1965, 56, 369-379. 

of behavior, Vol. 1. New York: Academic Press, 

1965. (Received October 28, 1968) 


Journal of Applied Psychology 
1969, Vol. 53, No. 6, 456-459 


WORK VALUES AND JOB SATISFACTION 
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Two groups of airmen (students and persons in permanent assignments) com- 
pleted measures of their job satisfaction and of their work values. Consistent 
relationships appeared between these two sets of variables. Evidence is pre- 
sented which indicates that the job satisfaction variance controlled by work 
values is independent of that controlled by other variables. 


The way a person evaluates work in gen- 
eral should be related to his attitudes toward 
his particular job. Someone who thinks that 
all work is an abomination to be undertaken 
only when all other strategies fail will likely 
be unhappy even in the most pleasant work 
situation. On the other hand, a person who 
feels that personal worth results only from 
self-sacrificing work or occupational achieve- 
ment would likely derive some satisfaction 
even in a demanding menial position. 

Previous investigators have discussed work 
values related to the ideals of the Protestant 
Ethic (Weber, 1958). Lenski (1961) reported 
a study which utilized a stratified sample of 
Detroit residents. He found differences in 
work values between four  socioreligious 
groups. His most general finding related to 
work values was that white Protestants and 
Jews were more likely to be committed to the 
spirit of capitalism and the ideals of the 
Protestant Ethic than were Negro Protestants 
and Catholics. A similar conclusion is sup- 
ported by the findings of Turner and Law- 
rence (1965). Among workers from rural com- 
munities who were predominantly Protestant 
they found job responses which would be ex- 
pected from persons ascribing to the ideals of 
the Protestant Ethic. They found responses 
which would not be predicted from Protestant 
Ethic ideals among workers in urban areas 
who were predominantly Catholic. 

If such differences are predictable from 
knowledge of religious affiliation, psychologi- 
cal explanation requires that they be medi- 
ated by some psychologically measurable 
difference. Differences in the job responses of 
Protestants and Catholics could be mediated 


1 Requests for reprints should be sent to the au- 
thor, who is now at the Department of Psychology, 
University of California, Berkeley, California 94720. 


456 


by differences in work values. If the work 
value differences can be measured by some 
psychological measurement device, it should 
allow the prediction of within-group differ- 
ences in job responses as well as between- 
group differences. This study is an attempt at 
the measurement of individual differences in 
work values. It was predicted that persons 
who ascribe to Protestant Ethic ideals would 
be more satisfied with their job. 


METHOD 


As a part of a larger study of 448 airmen and 
noncommissioned officers from the United States 
Air Force (Blood, 1968), Ss were asked to complete 
the Job Description Index (JDI) scales (Smith, 
Kendall, & Hulin, 1969), two Faces scales (Kunin, 
1955) which measured satisfaction with the job in 
general (JIG) and satisfaction with life in general 
(LIG), and an eight-item scale intended to measure 
amount of agreement with the Protestant Ethic. 
There were 420 usable questionnaires. Of these, 114 
were from airmen who were enrolled as full-time 
students in courses in aircraft maintenance. The 
other 306 Ss were serving in permanent assignments 
on a variety of low skill level tasks principally as 
technicians or as maintenance, transportation, or 
supply workers. 

The Protestant Ethic scale had four items which 
were intended to be in agreement with the Protes- 
tant Ethic and four which did not agree with the 
ideals of the Protestant Ethic. For each item, Ss 
responded with a number from 1 to 6 where 1= 
disagree completely and 6=agree completely. A 
component analysis of the eight items with a Vari- 
max rotation of two components demonstrated that 
the two subsets of items were appropriately inter- 
related. Table 1 shows the items and their com- 
ponent loadings. 

The four items loading heavily on the first com- 
ponent (Items 2, 4, 6, and 7) were summed for 
each individual and called the “proProtestant Ethic” 
score. The four items with large loadings on the 
second component (Items 1, 3, 5, and 8) were 
summed and called the “nonProtestant Ethic” score. 
These two work value dimensions were correlated 
113 among the 114 students in the sample and 
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TABLE 1 


Loapincs OF PROTESTANT Etuic ITEMS ON 
VARIMAX ROTATED COMPONENTS 








Components 
Items 


I II 

1. When the workday is finished, a per- 
son should forget his job and enjoy 
himself. a3 co 

2. Hard work makes a man a better per- 
son. .60 au 

3. The principal purpose of a man’s job 
is to provide him with the means 
for enjoying his free time. all? .61 

4. Wasting time is as bad as wasting 
money. .67 02 

5. Whenever possible a person should 
relax and accept life as it is, rather 
than always striving for unreach- 
able goals. 

6. A good indication of a man’s worth is 
how well he does his job. i 02 

7. If all other things are equal, it is bet- 
ter to have a job with a lot of re- 
sponsibility than one with little re- 
sponsibility. 

8. People who ‘‘do things the easy way” 
are the smart ones. 





Oo |) 207 





—.06 oo 





—.028 among the 306 permanent party members in 
the sample. Correlations were computed between the 
Protestant Ethic dimensions and the satisfaction 
measures. The five JDI scales were included sepa- 
rately and also summed in these analyses. Because 
there was evidence that there were differences be- 
tween permanent party and technical school stu- 
dents in responses to JDI scales (Blood, 1968) the 
correlations were made separately for permanent 
party and students. 


RESULTS AND DISCUSSION 


Table 2 shows the correlations between 
the satisfaction measures and the Protestant 
Ethic dimensions in both the student sample 
and the permanent party sample. Though 
none of the correlations is large, the direc- 
tions of the relationships are obvious. With 
only two exceptions the data show that agree- 
ment with the Protestant Ethic is directly 
related to satisfaction, and agreement with 
nonProtestant Ethic items is inversely related 
to satisfaction. This result implies that the 
more a worker agrees with the ideals of the 
Protestant Ethic, the more he will be satisfied 
in his work and with life in general. There 
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are other important influences on the satisfac- 
tion of workers. In the sample for this study 
there were significant differences between stu- 
dents and permanent party on the satisfaction 
variables. Nonetheless, within the student and 
permanent party samples, among workers on 
similar tasks in the same organization who 
share the same reward structure, additional 
variance in job satisfaction was predictable. 

In order to assess the contribution of the 
Protestant Ethic dimensions to job satisfac- 
tion relative to other variables, a multiple 
correlation was computed for each of the 
satisfaction measures using age, education, 
tenure, father’s occupation, and the Protes- 
tant Ethic dimensions as independent varia- 
bles. The results of these analyses are shown 
in Table 3. Darlington (1968) has pointed 
out the dangers of overinterpreting multiple 
regression coefficients. However, it is possible 
to see that the Protestant Ethic dimensions do 
make a contribution to the prediction of job 
satisfaction, especially if we consider only the 
general measures of job satisfaction, JDI sum, 
and JIG. In addition to the regression coeffi- 
cients Darlington (1968) suggests considera- 
tion of validity and usefulness * in assessing 
the “importance” of predictor variables. The 
validity coefficients for the Protestant Ethic 
dimensions rank 1 and 2 among all predictors 








2The usefulness of a particular predictor is the 
difference between R2 computed with all of the pre- 
dictors and R? computed with all of the predictors 
included except the predictor of interest. 


TABLE 2 


CORRELATIONS BETWEEN SATISFACTION MEASURES 
AND PROTESTANT ErHic DIMENSIONS 


— _-—_—_--» 





Students Permanent 
Satisfaction (N = 114) (NV = 306) 
variables - = - 
Pro Non Pro Non 
JDI sum 18%) s le .245* TOR" jms 2® 
JDI work 09 —.16* Lye | eee 
JDI supervisor 06 —.15 0o* — O01 
JDI people Lo ake 10* —.13" 
JDI pay —.02 —.31** 14* —.05 
JDI promotion cea 02 .05 —.06 
Job in general soe dives Lie 10* —,.13* 
Life in general 08 —.09 17** | —,06 
* > <.05 PA oh al 
> <.01 
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TABLE 3 


STANDARDIZED REGRESSION WEIGHTS AND MuLtTipLe CORRELATION COEFFICIENTS FOR THE 
PREDICTION OF SATISFACTION MEASURES AMONG STUDENTS AND PERMANENT PARTY 


Students (V = 114) 












































Dependent variables 

Independent 

variables ime ATT (ath) | eater OH 

Sum Work Super People Pay Prom JIG LIG 
Age —.06 04 —.12 O01 —.02 —.12 06 01 
Education —.10 —.14 — .04 —.12 01 04 —.06 —.15 
Tenure sili .09 14 —.02 .03 08 O01 —.01 
Father’s occ. .05 .03 nu —.02 —,12 .09 —.03 04 
Pro (PE) no 12 .08 .20 01 28 .24 .09 
Non (PE) —.24 —.16 —.14 —.17 —.33 .00 —.20 —.09 
R sobie .26 ONE .28 oko 132" 31 19 
Permanent party (V = 306) 

Age .16 .05 .20 14 —.01 .09 —.01 —.12 
Education —.08 —.05 —.03 —.10 — .04 —.03 —.05 —.01 
Tenure —.06 .08 —.13 .05 09 —.34 12 Al 
Father’s occ. ml 14 04 09 10 OL 07 .03 
Pro (PE) 14 aS 08 07 oS 07 .09 18 
Non (PE) —.13 —.13 —.02 —.16 —.06 —.05 —.15 —.06 
R p20nu PO Re LO 2a .20* oie waa Lor 

* 

md S01. 


for predicting JDI sum and JIG in the stu- 
dent sample, and they rank in the top three 
predictors for predicting JDI sum and JIG in 
the permanent party sample. In both samples 
the usefulness of the Protestant Ethic dimen- 
sions ranks 1 and 2 among predictors of JDI 
sum and JIG. 

Whether a causal relationship exists be- 
tween the work value dimensions and job 
satisfaction is a researchable question. It 
seems more logical to the author to assume 
that work values precede and influence job 
satisfaction rather than the opposite. Future 
research should investigate this relationship. 
Perhaps higher job satisfaction is partly a 
consequence of congruence between indi- 
vidual and institutional goals. If future re- 
search establishes that this is the case, it 
would constitute a justification for the sug- 
gestion that workers should hold goals similar 
to those of management. This is now implied 
by human relations theorists without justifi- 
cation. 


Expansion and refinement of the Protestant 
Ethic dimension measurements should also be 
undertaken in future studies. Not only addi- 
tional items, but other item formats should 
be attempted. An antiProtestant Ethic di- 
mension should be added, and hopefully the 
nonProtestant Ethnic dimension should be 
defined in terms of what it does specify rather 
than what it does not specify. 

As a final thought, some recent attempts 
to assimilate hard-core unemployed into the 
industrial work force have attempted to in- 
still ideals (resocialize) similar to the ideals 
of the Protestant Ethic, for example, hard 
work brings rewards, occupational achieve- 
ments bring prestige, and so forth. In evalu- 
ating the impact of these programs, it will be 
helpful if it is possible to measure changes in 
such work values. One of the first concerns 
of administrators of such programs should be 
to find out if changes in work values are ac- 
companied by changes in job satisfaction and 
job performance. 
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USE OF LEADERSHIP POWERS IN INDUSTRY * 
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This study investigated (a) the range of corrective powers available to military 
and industrial supervisors when correcting subordinates’ behavior and (b) the 
factors influencing the supervisors’ use of these powers. Both situational and 
personal factors (number of employees supervised, years of experience as a 
supervisor, and the nature of the problem presented by the subordinate) 
were found to influence the supervisor’s choice of corrective power. Military 
supervisors relied more on direct attempts to change subordinates’ behavior 
through reliance upon extra instruction, direct punishments, and changes in 
the task environment of subordinates, Industrial supervisors relied more on their 


persuasive powers. 


Studies of leadership behavior have focused 
mainly upon the identification of the dimen- 
sions of a leader’s behavior and the conse- 
quence of variations in these dimensions for a 
subordinate’s morale and productivity (Bales, 
1953; Fiedler, 1965; Fleishman, 1953; Kahn 
& Katz, 1960). These studies have revealed 
the importance of leadership behavior as 
related to task direction and to maintaining 
the socioemotional well-being of subordinates. 
Less attention has been paid to the question 
of how leaders use the formal social powers 
associated with their organizational roles. 
Yet this question is of particular interest in 
industrial and military organizations. By 
virtue of their roles, formally appointed lead- 
ers control resources that are valued or re- 
quired by subordinates. Among these resources 
are the control of sanctions, control of com- 
munication channels, and control of the di- 
rection of task performance. In essence these 
controls, or social powers, provide the means 
by which the formally appointed leader can 
exercise influence and thus have a central role 
in mediating the outcomes for subordinates. 

Many questions can be raised concerning 
these powers. For example, does the amount 
of experience the individual has had as a 
supervisor relate to his use of the powers the 
organization allows him to control? In an 


1 The collection of data was made possible through 
the helpful cooperation of Charles A. Thomas and 
the American Association of Industrial Management, 
Nationa] Metal Trade Association, 

2Requests for reprints should be sent to David 
Kipnis, Department of Psychology, Temple Univer- 
sity, Philadelphia, Pennsylvania 19122. 


unpublished doctoral dissertation by Schreiber 
(cited in Carter, 1952) it was found that 
when inexperienced leaders were given too 
much power, their behavior was disrupted. 
What situational factors affect the use of 
social powers? Do supervisors directing large 
numbers of men use their powers in the same 
fashion as those directing fewer men? Does 
the personality of the supervisor influence the 
ways in which he uses the resources that he 
controls? Do overly aggressive supervisors 
use their coercive powers to induce compli- 
ance more frequently than less aggressive 
supervisors? 

In essence we are asking if there is a psy- 
chology of the use of social powers within 
industry? With the exception of the impor- 
tant and systematic research that has derived 
from French and Raven’s (1959) classifica- 
tion of social powers, there has not been much 
interest in this question. Yet it is clear that 
one can distinguish between questions con- 
cerned with leadership style (i.e., how the 
leader behaves with respect to decision 
making, consideration, task orientation, etc.) 
and with leadership power, Several studies 
have reported that the influence of a leader 
over subordinates varied systematically with 
his control of resources, independently of 
his style of leadership (Kipnis, 1958; Pelz, 
1051), 

The purpose of the present study was to 
investigate one aspect of the relation between 
social power and supervisory behavior. The 
study investigated (a) the range of social 
powers available to supervisors when correct- 
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ing subordinate behavior and (0) the situa- 
tional and personal factors influencing the 
supervisors’ use of these powers. The present 
research conducted in an industrial setting 
was a follow-up of a previously unpublished 
investigation (Kipnis, Lane, & Frankfurt, 
1961) of naval supervisors. The military 
study found that a variety of personal and 
situational factors influenced the military 
supervisors’ choice of corrective powers. 

The following findings from the military 
study are relevant to the present research: 

1. Military supervisors supervising large 
numbers of men relied upon their legal powers 
to punish by placing subordinates on report— 
a procedure often culminating in court- 
martial. 

2. As the complexity of the problem in- 
creased, military supervisors (a) more fre- 
quently transferred subordinates to a different 
set of duties and (5) increased the number 
of corrective powers used. 

3. There appeared to be a “treatment of 
choice” associated with problems presented by 
subordinates. Different powers were invoked 
by the supervisor according to the type of 
problem presented by the subordinate. 

4, Experienced supervisors were more likely 
than inexperienced supervisors to correct di- 
rectly a subordinate’s behavior. Inexperienced 
supervisors either referred subordinates to 
someone else, or relied upon their legal 
powers. In a follow-up study on this last point 
(Kipnis & Lane, 1962), it was found that 
supervisors who lacked confidence in their 
leadership talents were more likely to use 
the latter forms of corrective powers. 

It was clear that the range of corrective 
powers reported by these military supervisors 
was not chosen because of whim or individual 
idiosyncracies. No supervisor mentioned phys- 
ical coercion, fines of money, or excessive 
restrictions of personal liberties. The military 
supervisors’ descriptions of their behavior 
were in fact descriptions of the constraints 
imposed upon them by the organizational 
structure. If the magnitude and variety of 
corrective powers permitted the supervisor 
were increased or decreased, it could be 
expected that the reports of the supervisors 
would be correspondingly altered. 
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METHOD 


The same procedure used in the military study 
was used in this investigation. An open-ended ques- 
tionnaire was administered to a sample of 184 super- 
visors from five different companies engaged in light 
manufacturing. The questionnaire was given on the 
first day of a supervisory training course. The 
questionnaire asked each supervisor to describe an 
incident that occurred within the past year in which 
a subordinate’s behavior was below average. In addi- 
tion, supervisors were asked to describe what was 
done about the incident by themselves or by others. 
Respondents were asked to give the following infor- 
mation concerning their subordinates: (a) number 
directly supervised, (b) union or nonunion members, 
and (c) hourly or salaried pay. In addition, they 
were asked how many years they had been a super- 
visor. This procedure provided a listing of the kinds 
of corrective powers available to the supervisors, as 
well as the frequency with which each of these 
corrective actions was used. 

Usable returns were obtained from 131  super- 
visors. Of the remainder, 25 returns described inci- 
dents that happened to someone else and 28 described 
cases that involved female subordinates. It was 
decided to analyze only the male returns. The over- 
whelming majority of the supervisors (89%) were 
directing hourly paid, blue-collar workers. Hence the 
findings should not be generalized to salaried, white- 
collar samples. 

The problems and actions taken by the supervisors 
were coded according to a classification system used 
in the naval study. This system was used directly 
with the industrial sample, with the addition of the 
category of man fired and the substitution of the 
category written warning for the category written 
report. 

The kinds of problems presented by subordinates 
were classified as follows. 

1. Attitude—The subordinate showed a lack of 
interest in the company, work, or personal ad- 
vancement. 

2. Discipline—The subordinate failed to follow the 
rules of conduct prescribed by the company. 

3. Work—The subordinate failed to maintain 
minimum standards in performing work. 

4. Appearance—The subordinate failed to dress 
appropriately. 


Corrective Actions 


The ways in which the supervisors reported 
handling the problems were classified into eight cate- 
gories. Since many supervisors reported taking more 
than one action, multiple coding was used. However, 
this multiple coding was used only between, and 
not within, categories. 

1. Verbal: (a) Diagnostic talk—An attempt was 
made by the supervisor to find out the reasons for 
the subordinate’s unacceptable behavior. (b) Cor- 
rective talk—The supervisor pointed out the conse- 
quences of the subordinate’s substandard behavior, 
and/or discussed ways in which the subordinate 
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could improve. There was no indication that the 
supervisor tried to find out why the subordinate 
was behaving as he did. 

2. Increased supervision: (a) Extra instruction— 
Additional or extra instruction in the area of the sub- 
ordinate’s poor behavior was assigned, or the super- 
visor spent extra time with the subordinate, closely 
directing his work. (6) Inspection—Frequent check- 
ups were made on the subordinate’s performance. 

3. Situational change: (a) Reassign—New or addi- 
tional duties were assigned to the subordinate, or 
the subordinate was reassigned to a different task. 
The reassignment was not made for purposes of 
punishment. (6) ‘Transfer—The subordinate was 
transferred to a different department or shift. 

4. Penalty: (a) Reprimand—The subordinate was 
rebuked for his below-standard behavior. (b) Extra 
work—The supervisor assigned difficult or dirty 
work. (c) Reduced privileges—The subordinate 
was penalized by temporarily denying or reducing 
privileges. 

5. Refer: The subordinate was referred to a 
superior, a peer, a specialist, or to the personnel 
office. The supervisor consulted with others as to 
what to do. Included here were two cases where 
the supervisor did nothing to correct performance. 

6. Written warning (report for military): The 
subordinate was given an official written warning 
from the company advising him that his perform- 
ance was unacceptable. 

7. Man fired: The subordinate was discharged 
from the company. 

8. Example: The supervisor acted as a model in 
the subordinate’s problem area. This involved no 
direct attempts at instruction. 


TABLE 1 


SUBORDINATE PROBLEMS REPORTED BY SUPERVISORS 





Problem of subordinate 


Appearance 0% 9% 
Attitude 8% 7% 
Discipline 27% 14% 
Work 47% 42% 
Work and attitude 8% 1% 
Work and discipline 5% 5% 
Work and appearance 1% 6% 
Other multiple combinations of 
problems 3% 10% 
Totals 100% 100% 
Totals 
Total work problems mentioned 62% 67% 
Total discipline problems 36% 24% 
Total attitude problems 18% 23% 
Total appearance problems 2% 23% 
aN = 131. 
bN = 146. 


Industry*| Military 
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Outcomes 


While not requested, some supervisors wrote on 
the questionnaire that the corrective actions taken 
had improved the subordinate’s behavior and no 
further difficulty with the subordinate ensued. This 
information was coded in the following manner: 
(a) improvement—the actions taken corrected the 
behavior; and (b) not reported. 


RESULTS 


Table 1 shows the kinds of supervisory 
problems reported by the supervisors. For 
comparative purposes, the distribution of 
problems reported by the naval sample is also 
shown. 

Examination of the totals in Table 1 shows 
that problems of appearance were mentioned 
more frequently by the military, while prob- 
lems of discipline were mentioned less fre- 
quently. Two-thirds of the problems men- 
tioned by both industrial and military super- 
visors involved getting subordinates to do 
their work properly. Problems of motivation, 
as reflected in the incidence of attitudinal 
problems, were mentioned by up to 18% of 
the industrial sample and 23% of the mili- 
tary sample. These latter findings illustrate 
the well-known fact that socioemotional 
problems constitute an important aspect of 
supervision. 

Table 1 also indicated that most subordi- 
nates manifested one problem at a time to 
their supervisors. In 17% of the industrial 
descriptions and in 28% of the military de- 
scriptions, subordinates were described as 
manifesting two or more supervisory problems 
simultaneously. We shall return to this find- 
ing when we consider the relation between 
problem complexity and solutions attempted. 

The corrective actions taken by the indus- 
trial sample are shown in Table 2. For pur- 
poses of comparison, the actions taken by 
military supervisors are shown also. It should 
be noted that the military classifications do 
not include the category man fired, since such 
a corrective action is not used in the mili- 
tary. Because 43% of the industrial super- 
visors and 51% of the military stated they 
used more than one corrective action, the total 
percentages shown in Table 2 exceed 100%. 

Both industrial and military supervisors 
relied upon a wide variety of powers to cor- 
rect performance. Many of the actions were 
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based upon the supervisor’s persuasive 
powers; others relied upon actual or verbal 
threats of punishment; others upon the expert 
knowledge of the supervisor; and still others 
upon the power of the supervisor to make 
changes in the work environment of the sub- 
ordinate. Between 8-10% of both the indus- 
trial and military supervisors invoked higher 
administrative levels through the actual firing 
of the subordinate, or through the use of 
official warnings, or through formal reports. 
Finally, about 15% of both groups consulted 
with someone as to what to do about the 
problem, or referred the subordinate else- 
where. In essence, these listings represent the 
range of powers that the industrial and mili- 
tary organizations allowed their supervisors 
to use. 

It may also be observed in Table 2 that 
industrial supervisors were less likely than 
military supervisors to attempt direct changes 
in their subordinates’ behaviors. That is, sig- 
nificantly fewer industrial supervisors reported 
using extra instruction (p < .01), or changing 
the pattern of the subordinate’s job duties in 
an attempt to correct performance (p < .01). 
In terms of direct punishments that did not 
involve formal proceedings, industrial super- 
visors more frequently relied upon reprimand- 
ing their subordinates (p< .01), whereas 
military supervisors used punishments that 
directly changed the subordinates’ working 
conditions through extra work assignments or 
reduced privileges. 


Relation between Problems and Actions 


The first study found a “treatment of 
choice” associated with each problem en- 
countered. To determine if this held in the 
present sample, the 109 supervisors who re- 
ported that their subordinates presented only 
a single problem were sorted into three prob- 
lem areas of attitudes, discipline, and work. 
The distribution of corrective actions taken 
for each problem area was then determined. 

Diagnostic talks were used more frequently 
in incidents involving attitudes or discipline 
than in incidents involving work (31% versus 
15%, p< .05). Increased supervision was 
used more frequently in problems of work 
than in problems involving attitudes and 
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TABLE 2 


CORRECTIVE ACTIONS TAKEN BY SUPERVISORS 

















Seen 
. i" of industria 
pie by Industrial | Military» versus 
military 
p° 

Verbal 

Diagnostic talk 23% 18% ns 

Corrective talk 42% 33% ns 
Increased supervision 

Extra instruction 19% 33% <.01 

Inspection 7% 10% ns 
Situational change 

Reassign 3% 18% <.01 

Transfer 8% 1% —4d 
Penalty 

Reprimand (verbal) 16% 5% <.01 

Extra work 0% 9% —4d 

Reduced privileges 1% 8% —4d 
Refer 15% 15% ns 
Written warning 

(report) 7% 10% ns 
Man fired (industry 

only) 8% 
Set example 1% 7% —4 

aN = 131, 

bN = 146, 


©» values obtained through chi-square analyses in which 
the number of industrial and military supervisors stating they 
carried out the action were compared. 

d Chi-square not computed because of small Ns involved. 


discipline (45% versus 6%, p < .01). Finally 
14% of the supervisors with discipline prob- 
lems stated that the subordinate was fired, 
as compared to 0% of the supervisors with 
attitude problems and 3% of the supervisors 
with work problems. It appears that sub- 
ordinates are most likely to be fired for 
breaking rules. These findings closely parallel 
the original military findings. In that study, 
poor work was associated with increased 
supervision, poor attitudes with diagnostic 
talks, discipline problems with official reports 
and/or diagnostic talks, and poor appearance 
with frequent inspections. 

Another finding had to do with the com-~, 
plexity of the problems presented by the 
subordinate. In both studies, when the sub- 
ordinate presented two or more problems 
simultaneously (e.g., poor attitudes and poor 
work), supervisors changed the job environ- 
ment of the subordinate. The action of trans- 
fer was reported by 18% of the industrial 
supervisors with complex problems and 5% 
of the industrial supervisors with simple prob- 
lems (p< .10). In the military study, the 
action of reassignment was used by 39% of 
the supervisors reporting complex problems 
and 10% of the supervisors with simple 
problems (p < .01). 
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In addition to the specific kinds of actions 
used, problem complexity was also related to 
the number of corrective actions used by the 
supervisor. Two or more corrective actions 
were used by 38% of the industrial super- 
visors with simple problems and by 62% of 
the industrial supervisors with complex prob- 
lems (p< .05). In the military study two 
or more corrective actions were reported by 
41% of the supervisors with simple problems 
and 76% of the supervisors with complex 
problems (p < .01). 


Years of Experience as a Supervisor 


The military study found that less experi- 
enced supervisors more frequently referred 
their subordinates to someone else. This find- 
ing was repeated in the present study. 
Twenty-seven percent of the supervisors 
(N = 40) with 2 yr. or less experience stated 
that they referred the subordinate’s problem 
to someone else as compared to 7% of the 
industrial supervisors (V = 30) with 3-8 yr. 
of experience and 12% of the industrial super- 
visors (V = 42) with 9 or more yr. of experi- 
ence (p < .05).* However, the present study 
found no evidence that inexperienced super- 
visors used official warnings as was true of 
inexperienced military supervisors. 


Number of Men Supervised and 
Actions Taken 


There is general agreement that the more 
men the supervisors are required to direct, 
the less able they are to give their men per- 
sonal attention (Dale, 1959; Yoder, 1956). 
Support for this contention was found in the 
military study, in that military supervisors 
directing large numbers of men were less 
likely to use extra instruction and more likely 
to place subordinates on official report. 

In the present study it was also found that 
as the number of men supervised increased, 
the use of official warnings increased. Seven- 
teen percent of the supervisors (N = 35) 
directing 15 or more subordinates, 3% of 
those (NV = 36) directing 7-14 subordinates, 
and 0% of those (V = 40) directing less than 
7 subordinates used official warnings as a 
means of correcting subordinates’ perform- 


3 Nineteen supervisors did not report their years 
of experience. 
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ances (p < .01, comparing 14 or less versus 
15 or more). The use of extra instruction, 
however, was not related to number of men 
supervised. There was no relation between 
years experience as a supervisor and number 
of men supervised.* 


Union—N onunion 


There were only 12 supervisors of union 
members. While this number was too small 
for statistical analysis, inspection revealed 
that in comparison to nonunion members, 
supervisors were less likely to talk to union 
members, less likely to spend time with them 
in extra instruction, and more likely to repri- 
mand them or to issue official warnings as 
preferred means of correcting performance. 


Actions and Outcomes 


The more corrective actions (excluding 
firings) the supervisor reported, the more 
likely he was to state that the subordinate’s 
performance improved. Twenty-seven percent 
of the supervisors who used one corrective 
action and 52% of the supervisors who 
used two or more corrective actions stated 
that their subordinates’ behavior improved 
(pim.05): 


DiscuUSSION 


The findings point to the important role 
of corrective powers in supervisory decision- 
making and problem-solving behaviors. It 
appears that the range of corrective powers 
controlled by supervisors represents the range 
of potential solutions that they may try when 
correcting subordinate’s performance. This 
problem-solving interpretation was suggested 
in the present study by the relationship be- 
tween the kind and complexity of the problem 
presented by subordinates and the kind and 
number of corrective actions used by super- 
visors. For example, cemplex problems led 
to the supervisor’s trying more corrective 
actions, and each kind of subordinate problem 
evoked a different corrective action. It would 
follow from this interpretation that as the 
range of corrective powers that is allowed the 
supervisor is increased or decreased by man- 


agement, one could expect corresponding in- 


4 Twenty supervisors did not report the number of 
men they supervised. 


Usr oF LEADERSHIP POWERS IN INDUSTRY 


creases or decreases in the supervisors’ abili- 
ties to correct subordinates’ performance. 

It was further found that industrial super- 
visors were less directive in their attempts to 
correct behavior than military supervisors. 
The military supervisor more often corrected 
subordinates’ performance by changing their 
duties, increasing the amount of direct super- 
vision, and by penalizing them by assigning 
extra work and/or invoking penalties. The 
industrial supervisor relied more on his per- 
suasive powers through use of diagnostic talks, 
corrective talks, or verbal reprimands. 

Does this mean that the industrial super- 
visor is more ‘‘tenderhearted,” or does not 
have a tradition of using more direct forms 
of action? Even a cursory reading about the 
industrial management scene from the 1880s 
through the beginning of World War IT would 
indicate that this is not the case. Prior to 
World War II the industrial supervisor was 
more likely to be the person allowed by man- 
agement to have a major voice in hiring, 
firing, demotions, layoffs, wages, and the gen- 
eral regulating of the working conditions of his 
subordinates. A supervisor of that time was 
nicknamed aptly “bull of the woods.” Since 
that time, however, supervisory powers have 
been reduced by union contracts, delegation 
of responsibilities to staff personnel, and broad 
charges in management philosophy. As a 
result of these recent events, industrial super- 
visors control a smaller range of corrective 
powers than do their military counterparts. 
This reduced control is believed reflected in 
the greater reliance of industrial supervisors, 
in the present sample, upon verbal persuasion 
rather than upon more direct attempts to 
influence subordinates. 

Further questions concerned with leader- 
ship powers can be organized into three areas. 
The first is concerned with situational factors 
that influence the use of supervisory powers. 
The present study found differences between 
military and industrial organizations. Within 
each organization, span of control and kind 
of problem also influenced supervisor’s cor- 
rective actions. 

A second question is concerned with the 
supervisor’s own response to his possession 
of social powers. An interesting study by 
Lange and Jacobs (1960) of the actual day- 
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to-day behaviors that distinguish between 
effective and ineffective platoon leaders 
revealed that ineffective leaders used their 
powers to reward and punish inappropriately. 
Instead of rewarding or punishing subordi- 
nates in terms of actual job performance, 
they used these sanctions to curry favor or 
to punish irrationally. Our research (Kipnis 
& Lane, 1962) strongly suggests that inex- 
perience and lack of confidence may make 
the supervisor reluctant to use the full range 
of powers that he controls. 

The third question that requires attention 
has to do with the subordinate’s response to 
supervisory powers. Does reprimanding an 
employee do any good besides allowing the 
supervisor to “blow off steam’? Does re- 
assigning the problem employee improve his 
performance? In what ways are the em- 
ployee’s self-esteem, ideological allegiances, 
and morale affected by reliance upon the vari- 
ous forms of power? Many students suggest 
that the possession of a broad complex of 
powers by a leader causes the subordinate 
to feel uneasy, distrustful, and reluctant to 
reveal weaknesses in himself to his super- 
visors (Hutchins & Fiedler, 1960; Mulder, 
1959; Wispé & Lloyd, 1955). These sub- 
ordinate feelings in turn lead to distortions 
in upward communications and approach- 
avoidance conflicts over interactions with 
supervisors (Mellinger, 1956; Read, 1962). 
Thus it may prove that increasing the range 
of powers controlled by the supervisor will 
improve his problem-solving abilities, but at 
the expense of provoking more guarded and 
defensive behaviors among subordinates. 
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IMPACT OF EMPLOYEE PARTICIPATION IN THE 


DEVELOPMENT OF PAY INCENTIVE PLANS: 
A FIELD EXPERIMENT 


EDWARD E. LAWLER III? anp J. RICHARD HACKMAN 
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A study of effects of employee participation in the development of pay incentive 
plans. The Ss were part-time workers who clean buildings in the evenings. Three 
autonomous work groups developed their own pay incentive plans to reward 
good attendance on the job (Condition A). These plans were then imposed by 
the company on other work groups (Condition B). There were two groups of 
control Ss: One talked with Es about job attendance problems but received no 
additional experimental treatment, and the other received no treatment. A 
significant increase in attendance followed only Condition A. Possible reasons 
cited: (a) participation caused Ss to be more committed to the plan; (b) Ss 
who participated in the development of their plan were more knowledgeable 
about it; and (c) participation increased the employees’ trust of the good 
intentions of management with respect to the plan. 


Literally thousands of different pay incen- 
tive plans have been developed and used in 
an effort to increase the motivation of em- 
ployees in work organizations. These plans 
have tried to motivate a number of different 
behaviors: productivity, sales, cost reduction, 
job attendance, etc. They also have differed 
widely in form: Some have used individual 
incentives, others have provided rewards on 
a group or organization-wide basis; some have 
been based on small units of behavior while 
others have been based on relatively long-term 
performance. 

A good deal of research has attempted to 
determine the relative effectiveness of dif- 
ferent kinds of incentive plans. For example, 
group plans have been compared with indi- 
vidual plans and bonus plans have been com- 
pared with salary increase plans. The results 
suggest that these characteristics of plans do 
affect their success (see, e.g., Opshal & 
Dunnette, 1966; Rothe, 1960; Viteles, 1953). 
Nevertheless, it is striking to note the number 
of instances in which an identical plan is suc- 
cessful in one situation but unsuccessful in 


1 The authors would like to thank H. Elston, M. 
Nunes, and R. Breck for their cooperation with the 
study. Wendy Silin deserves special mention for her 
help with the data analysis. 
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Lawler III, Department of Administrative Sciences, 
Yale University, 2 Hillhouse Avenue, New Haven, 
Connecticut 06520. 


another (e.g., Whyte, 1955). Apparently the 
success of pay incentive plans is determined 
by more than just the operating mechanics of 
the plans themselves. 

Virtually no research has been done to 
identify the nonmechanistic factors that are 
important in determining the success of pay 
plans. The present study attempts to deter- 
mine the impact of one such nonmechanistic 
factor on the success of a pay plan. Specif- 
ically, this study examines how the way a 
pay plan is developed and implemented in- 
fluences its effectiveness. 

Previous research (e.g., Coch & French, 
1948) suggests several reasons why the way 
a plan is developed and introduced may be 
crucial in determining how successful it is. 

If a pay plan is to be successful it is im- 
portant that all the participants understand 
it, be committed to it, and believe that it will 
be administered fairly. Clearly, all of these 
factors can be influenced strongly by the way 
the plan is developed and introduced. A pay 
plan that is developed by a mistrusted man- 
agement and imposed upon workers is not 
likely to be understood by employees, nor 
are the workers likely to be committed to its 
success. On the other hand, a plan that is 
participatively developed by workers is very 
likely to be understood by them and they are 
more likely to be committed to its success. 
The plan also is likely to be appropriate to the 


467 


468 


situation in which the workers find them- 
selves. The basic hypothesis of the study 
therefore is: Pay incentive programs will be 
more effective if they are participatively de- 
veloped than if they are imposed upon a group 
of employees by management. It should be 
noted here that there is no intention to indi- 
cate that just because a pay plan is partici- 
patively developed it will be successful. It 
seems unlikely that any plan will work if 
it is not set up properly and administered 
well, even if it is participatively developed. 


METHOD 
Research Strategy 


The basic hypothesis of the study was tested in 
a field experiment. The experimental approach was 
chosen because it allows the causal impact of the 
experimental factor (in this case, participative de- 
velopment of the pay plan) to be assessed with 
relatively little ambiguity. The study was done in a 
field setting to ensure that the manipulation would 
be realistic and important to Ss and to increase 
the likelihood that the results would be generalizable 
to other field settings. 


Research Site and Subjects 


The research was conducted in a small company 
that provides building maintenance services on a 
contract basis. The Ss were part-time employees of 
the company, who clean buildings during the evening. 
Most Ss worked 4 hr. a night. Prior to the study, 
the company had experienced extremely high rates of 
absenteeism and turnover among these employees. 

The Ss worked in groups ranging in size from 
2 to 25. There were about 15 such groups in the 
company at the time of the study. Each group was 
responsible for doing all the cleaning work in one 
building. Although the groups did similar work, 
they were highly autonomous. The employees al- 
ways reported for work at the building they were 
to clean and never came to the company offices. 
Because of this there was virtually no contact 
between employees in different work groups. 

The Ss tended to have very low educational levels, 
and most were members of minority groups. A 
number of them were illiterate. Approximately half 
the Ss were women, many of whom were housewives 
during the day. For most of the male Ss, the 
maintenance work was a second job. The Ss ranged 
in age from 16 to over 70. 


Procedure 


Nine work groups were involved in the experiment. 
Three designed their own incentive plans (the par- 
ticipative groups), two had incentive plans imposed 
on them (the imposed groups), two talked with 
the researchers but their pay plans were not changed, 
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and two received no treatment at all (the control 
groups). There were no apparent differences among 
the groups assigned to the different treatment condi- 
tions. The groups all worked in comparable buildings, 
and the members of the different groups were similar 
demographically (e.g., age, education, experience, and 
social class). 

Participative groups. Three work groups (of 10, 
9, and 8 members) were selected as participative 
experimental groups. Both of the authors met with 
one of the groups to help the employees develop 
an incentive plan, and each of the authors met sepa- 
rately with one of the other two groups. Both 
researchers worked with one of the groups, so that 
they would be able to behave in similar ways with 
the groups they were handling on an individual basis. 

In all cases the researchers were introduced to the 
employees during regular working hours by a member 
of top management. The manager told the employees 
that the company was concerned about high rates 
of absenteeism, and expressed his hope that the 
employees would work with the researchers in 
developing an appropriate plan for rewarding good 
attendance. 

The manager then left and. the researcher opened 
the discussion by emphasizing that it was his objec- 
tive to help the employees develop a plan and not 
to tell them what kind of plan they should develop. 
In all three groups an extensive discussion followed 
this introduction. During the initial phases of the 
discussion the workers expressed a great deal of 
mistrust of the researcher and they displayed con- 
siderable hostility toward both the researcher and 
the company. They continually demanded to know 
what kind of plan the company wanted them to 
develop, and they asked why the researcher was 
interested in working with them. The researcher 
allowed the initial discussion to continue for about 
45 min., at which point he asked the employees to 
talk things over among themselves and said that he 
would be back the next night to continue the dis- 
cussion. The employees then returned to their work 
and the researcher left. 

For all three groups the second meeting resulted 
in much more progress than did the first. Although 
still suspicious of the motives of the company and 
the researcher, the employees began to discuss what 
might constitute an acceptable plan. Much of this 
discussion focused on how large the bonus should be. 
During the discussion the researcher took on the role 
of a resource person for the group. At no time did 
he suggest a plan, although if he was asked about 
a specific idea he did react to it by stating a few 
general principles about what makes for successful 
pay incentive plans (e.g., it is important to relate 
pay to behavior). One group decided to ask for a 
very large amount (“since the company will cut 
whatever we ask for in half anyway”); the other 
two groups seemed more responsible and settled on 
dollar amounts that were less than the company 
had originally anticipated offering. 

By the third meeting in one group and the fourth 
in the others, a plan had been developed and agreed 
to by all group members. The three plans that were 
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developed by the groups did show some important 
differences. Two groups wanted their bonus to be 
computed on a weekly basis, while the other group 
wanted a monthly bonus. There were also differences 
in the size of bonuses requested and in the number 
of days of sick leave that should be allowed. The 
researchers presented the plans to management, and 
they all were quickly accepted with minor altera- 
tions. The alterations involved adjustment of the 
amount of the bonuses so that they would be equiva- 
lent for all three groups and specification of what 
would constitute an “excused” absence from work. 

As finally instituted, all plans offered cash bonuses 
of about $2.50 per week for perfect attendance. In 
one plan the bonuses were to be computed and paid 
at the end of each month, while the other two plans 
were on a weekly basis. When the plans were insti- 
tuted, a manager of the company returned to each 
group to answer any final questions and to explain 
why the changes had been made in the original pro- 
posals of the work groups. He did not ask them to 
approve the changes formally. 

Imposed groups. Plans identical to those developed 
by the employees in the participative condition were 
imposed by the company on two other groups 
(N=13; N =26). One group received the weekly 
plan and the other received the monthly plan. 

The same manager who had worked with the 
researchers in the participative condition instituted 
the plans in these groups. Accompanied by one of 
the researchers, he met with the groups and explained 
why a bonus plan was being instituted and how it 
was to operate. He spent considerable time with each 
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of the groups, and appeared to do an adequate job 
explaining the plans and showing the employees how 
they personally would benefit if they came to work 
more regularly. 

Control groups. The researchers visited two other 
groups (V=9; N=8) and talked with these em- 
ployees at length about incentive plans and about 
problems of absenteeism and turnover. It was 
stressed in these meetings that the researchers were 
interested in studying how people react to wages and 
that the company was concerned about the current 
high rates of absenteeism. No changes were made in 
the pay plans of these two groups. In two other 
groups (V =26; N =8) no changes in the pay plans 
took place and the researchers did not meet with the 
employees, but the attendance of the groups was 
monitored. 


RESULTS 


Results are presented in Figures 1 and 2. 
The data are expressed in terms of the per- 
centage of an employee’s scheduled work 
week that he actually worked. For most em- 
ployees the work week was 20 hr. long. Thus, 
if an employee worked 10 hr., he was scored 
as working 50% of his scheduled hours; if he 
worked 15 hr., he was scored as 75%. Figure 1 
shows the mean percentage of scheduled hours 
actually worked for Ss in the three participa- 
tive groups. Figure 2 shows analogous data 
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Fic. 1. Mean attendance of the participative groups for the 12 wk. before the 
incentive plan and the 16 wk. after the plan. (Attendance is expressed in terms 
of the percentage of hours scheduled to be worked that were actually worked.) 
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Fic. 2. Mean attendance of the imposed groups for the 12 wk. before the 
incentive plan and the 16 wk. after the plan. (Attendance is expressed in terms 
of the percentage of the hours scheduled to be worked that were actually 


worked.) 


for Ss in the two groups that had incentive 
plans imposed on them. In both figures, data 
are presented for the 12 wk. before the plans 
were instituted and for the first 16 wk. after 
the plans went into effect. 

Before the incentive plans were introduced, 
the average employee in the participative 
groups worked 88% of his scheduled hours; 
after the plan went into effect, the average 
employee worked 94% of his scheduled hours. 
This before-after difference was tested for sta- 
tistical significance by a median test yielding 
a chi-square of 9.35 (p < .001).? 

As is shown in Figure 2, there was no im- 
provement in attendance for groups in which 
the identical incentive plans were imposed by 
management. Before the imposed plans were 
instituted, the average employee worked 83% 
of his scheduled hours; in the 16 wk. after 
the plans were put into effect, the figure 
remained at 83%. 

Data gathered from the control groups 

2'This before-and-after comparison essentially uses 
each participative group as its own control group. 
This was done because the initial differences between 
the attendance levels of the imposed and the par- 
ticipative groups make post-comparisons between 
these groups artificial. 


(whose pay plans were unchanged) showed 
no significant changes during the period of the 
study, 

Thus, the data show that employee at- 
tendance improved only in those groups that 
participatively developed their own incentive 
plans. Neither the incentive plan alone nor 
participation and discussion alone yielded any 
changes in attendance. 


DIscuSSION 


The results of this study strongly support 
the notion that attention to the technical 
characteristics of a pay plan alone (i.e., the 
mechanics of its design and administration) 
may be insufficient to ensure the success of 
the plan. Indeed, the data suggest that par- 
ticipation in the development and implemen- 
tation of a plan may have more of an impact 
on the effectiveness of a plan than the 
mechanics of the plan itself. 

Why should participation be so important 
to the success of a pay incentive plan? One 
possibility is that participation can improve 
the quality of decisions that are made 
(Vroom, 1964). Thus, it could be argued that 
the participatively developed plans in this 
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study were uniquely suited to the groups that 
developed them—but somehow inappropriate 
for the other groups on which the plans were 
imposed, ‘This explanation seems unlikely in 
the present study, however, since all the work 
groups in the company were highly similar 
and there was nothing especially unique 
about the plans developed by the three par- 
ticipative groups. 

It did seem that members of the partici- 
pative groups more fully understood the plans 
than did members of the imposed groups. 
Despite a carefully rehearsed introductory 
talk given by the company manager, the im- 
posed groups did not receive as much infor- 
mation about the plan as did the participative 
groups. They did not feel as free to ask 
questions about the plan as did the partici- 
pative groups, and they did not have as much 
time to think about questions or to ask them 
as did the participative groups. The imposed 
groups received all their information about 
the plan in one session, whereas members of 
each participative group talked together about 
the plan for several hours over a week or 
two—possibly increasing their understanding 
of the plan and its implications. 

It also appeared that the participative 
groups were more committed to the success 
of the plans than were the imposed groups. 
There was evidence that the plans were viewed 
as “just another attempt by management to 
exploit us” by some members of the imposed 
groups. By participating in the development 
of the plans, many members of the partici- 
pative groups appeared to become more trust- 
ing of management’s intentions to administer 
the plans fairly. Their pride in “owning” the 
plans, coupled with the increased trust of 
management, may have enhanced considerably 
the desire of the participative employees 
to cooperate in making the plans a success. 
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A final comment may be in order about 
the characteristics of the S population studied. 
The employees were, almost without excep- 
tion, of a low socioeconomic class, and all 
were working at low-level jobs. Thus, it is 
perhaps surprising that they responded to the 
opportunity for participation as well as they 
did. For most of them it was the first time 
they had ever had an opportunity to con- 
tribute meaningfully to any decision making 
about their jobs. There was hostility and 
suspicion at the outset of the experiment. Yet, 
after the initial discussion with the re- 
searchers, a substantial number of the em- 
ployees began to respond to the challenge 
of developing a viable incentive plan—and 
they ultimately came up with plans that 
would have to be considered technically ade- 
quate. Thus it appears that—if a researcher 
or a manager is willing to deal with some 
initial hostility and suspicion—it should be 
possible to involve most employees in mean- 
ingful decision making about their jobs. 
And, if the results of this study have gen- 
eralizability, the payoff for both the employees 
and the organization should make the effort 
well worthwhile. 
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Two pilot studies of consumer attitudes toward auto versus public transport 
modes were conducted in Baltimore, Maryland, and Philadelphia, Pennsylvania 
(NV =550 and 471, respectively), using a quasi self-administered Likert-scaled 
questionnaire. Results indicate a consistently favorable preference for auto 
over a wide range of mode attributes but with marked differences in mag- 
nitude of preference between attributes. Significantly different patterns were 
also found between inner-city and suburban preferences. Implications for 
changing mode-use patterns are discussed briefly. 


Most transportation consumer research has 
been of the origin-destination variety that 
provides a detailed description of the traveler, 
mode used, and trip purpose (Gilat, 1963). 
Questions have been ‘answered from this re- 
search about where and how people traveled, 
but an explanation of their behavior generally 
has not evolved. A few studies, however, 
have partially focused on consumer attitude 
measurement emphasizing the identification 
and assessment of consumer values relevant to 
transport selection decisions (Ackoff, 1965; 
Lansing, Mueller, & Barth, 1964; Mahoney, 
1964; Stanford Research Institute, 1965). 

Although in most cases these efforts have 
achieved stated objectives, many have had 
several limitations that restricted the gen- 
eralization of their results. One of the most 
severe has been the small selected samples 
used. Another has been the narrowness of 
focus in terms of such variables as mode, trip, 
and/or characteristics of users. The latter 


1 This study was supported by United States De- 
partment of Commerce, Bureau of Public Roads, 
Contract CPR-11-0960, Project Director: A. N. Nash, 
and is partially based on G. A. Brunner, S. J. Hille, 
A. N. Nash, F. T. Paine, R. E. Schellenberger, and G. 
M. Smerk, User Determined Attributes of Ideal 
Transportation System, College Park: University of 
Maryland, 1966, 228 pp., and F. T. Paine, A. N. 
Nash, S. J. Hille, and G. A. Brunner, Consumer 
Conceived Attributes of Transportation, College Park: 
University of Maryland, 1967, 177 pp. 

2 Requests for reprints should be sent to Allan N. 
Nash, Department of Business Administration, Uni- 
versity of Maryland, College Park, Maryland 20742. 
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made it difficult to compare and contrast re- 
sults between studies because different vari- 
ables were included in each. In some cases the 
method of collecting data was not carefully 
constructed and/or evaluated. Finally, the 
designs were based on the proposition that the 
researchers knew which modal characteristics 
to study and how to define them. Usually 
abstract variables such as “convenience, com- 
fort, status, congestion, flexibility, expense, 
etc.” have been used as inputs without careful 
specific definitions. Research on various rating 
scales suggests that trait scales with fairly 
specific behavioral descriptions are more reli- 
able and valid than scales evaluating “how 
much” of a trait is possessed by a ratee with 
abstract scale gradations (Stockford & Bissell, 
1949). Experience with checklists also shows 
significant improvement in measurement com- 
pared to results obtained when global traits 
are used (Miner, 1969). 

Two pilot studies described herein at- 
tempted to alleviate partially some of the 
above limitations. 

First, an attempt was made to provide a 
more comprehensive coverage of significant 
variables affecting modal choice decisions. 
Previous models had only a modicum of 
success for predicting modal split decisions, 
probably because these decisions are more 
complex than they were originally thought 
to be. As few as two variables have been used 
(travel time and cost) to predict modal choice, 
and many studies include from four to six 
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variables. Their results suggested that the 
development of a valid prediction model for 
modal choice decisions requires the incorpora- 
tion of several attributes into the prediction 
milieu, and requires that the model be sensi- 
tive to the complex interrelationships existing 
among attributes. 

Second, the attitude instrument developed 
sought to determine both the importance of 
and satisfaction with modal attributes. The 
importance of a particular attribute is prob- 
ably a function of both the underlying 
strength of the human need or needs to which 
it is related and its present satisfaction level. 
The inclusion of satisfaction items with the 
importance of items sought to clarify the ex- 
tent to which importance of an attribute is a 
function of its present level of satisfaction as 
well as to assess overall satisfaction with 
existing alternative private and public modes. 
Thus, insight was obtained as to why people 
chose to travel in the modes presently used 
(preponderantly auto). 

Third, these studies focused on the de- 
velopment of factor definitions by subjecting 
a comprehensive pool of specific items tapping 
particular travel characteristics and behavior 
to factor analysis. It was hoped that progress 
would result toward a definition and classifica- 
tion of the attributes perceived by transport 
users as being relatively distinct and im- 
portant variables in the determination of 
their travel behavior. 


METHOD 


The first pilot study conducted in Baltimore, 
Maryland, involved the development and evaluation 
of an instrument directed at three questions of a 
five-question general design. These were (1) What 
attributes do consumers regard as salient in typical 
recent trips? (2) What is the relative importance of 
the attributes for particular trip purposes, and con- 
glomerately? (3) To what extent, and how, are 
demographic characteristics of respondents related to 
the importance of trip mode attributes? 

The second study was completed in Philadelphia, 
Pennsylvania, and includes specific questions relevant 
for all five questions in the design. The last two 
general questions were (4) To what extent do con- 
sumers perceive themselves as being satisfied with 
the attributes of auto versus public transport modes? 
(5) To what extent, and how, are demographic 
characteristics of respondents related to perceived 
satisfaction of trip mode attributes? 

The application of a revised instrument in Phila- 
delphia provided feedback on consistency of answers 
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to the original three questions under different trans- 
portation circumstances from those existing in Balti- 
more. 


Sample 


Sixty clusters (sampling points) in the Baltimore 
Standard Metropolitan Statistical Area and 80 clusters 
in the Philadelphia Standard Metropolitan Statistical 
Area were selected following a multistage area prob- 
ability sampling design. The Baltimore sample of 350 
households resulted in the completion of 550 indi- 
vidual questionnaires. The Philadelphia sample of 361 
separate households resulted in 471 usable ques- 
tionnaires. 

The composition of the samples along selected 
social and economic characteristics (age, sex, educa- 
tion, head or nonhead of household, race, income, 
house ownership, and number in household) was 
compared with 1960 census data to determine repre- 
sentativeness. The proportion of females and the well 
educated appeared overrepresented in both Balti- 
more and Philadelphia results. 

A quasi quota sampling procedure was invoked 
during the Philadelphia study after the tactic of 
having the interviewer ask for the man in the house 
failed to adequately correct an imbalance of too many 
females. This procedure improved the final balance 
to 58% female, 42% male. Considering the probable 
changes in the population during the interval be- 
tween the 1960 census and the gathering of sample 
data, the distributions appeared to be representative 
for other characteristics. 


Questionnaire 


A questionnaire consisting of three parts and a 
household information cover sheet was used to col- 
lect data for the Philadelphia study.2 As indicated, 
this was a modified version of the Baltimore ques- 
tionnaire with the addition of satisfaction questions. 
Part A contained a set of questions designed to 
elicit descriptive information about the two trip 
purposes asked about in Parts B and C of the ques- 
tionnaire, that is, (a) the respondent’s last common 
or usual trip to work or school, and (b) his last 
common or usual in-town, nonwork trip. Part B 
included a set of 35 items measuring the importance 
of attributes contained in the items along a 7-degree 
Likert-type interval scale, ranging from “not at all 
important” to “of greatest importance.” These items 
were designed to measure exhaustively factors salient 
to modal choice decisions, suggested by a search of 
the literature and results obtained in the Baltimore 
study. 

Part C contained a set of 33 items constructed to 
determine satisfaction with the Part B attributes for 
auto and the respondent’s most likely form of public 
transportation for both trip purposes.* 


3 There were two questionnaires. The questions re- 
mained the same but their order was varied to con- 
trol halo and positional effects. 

4Two attributes were eliminated in Part C be- 
cause they were inapplicable when posed as satisfac- 
tion items. 
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TABLE 1 
PERCENTAGE OF SATISFIED RESPONSES TO EAcH ITEM FOR AUTO VERSUS PUBLIC MopEs 
witH ITEMS RANKED BY MEAN ImporTANCE: WorK Trip, Trip Purpose 1 
are % satisfied 
ee Description ste a eee with public Difference 
transit 
28 | Arrive without accident 1 89 86 3 
11 | Arrive at intended time 2 94 63 31 
18 | Safest vehicle 3 94 91 3 
33 | Avoid stopping for repairs 4 94 92 Z, 
12 | Shortest distance 5 97 64 33 
24 | Fast as possible 6 95 70 25 
17. | Avoid changing vehicle 7 98 69 29 
2 | Vehicle unaffected by weather 8 90 84 6 
5 | Protected from weather while waiting 9 96 54 42 
1 | Shortest time 10 92 65 IH 
26 | Avoid waiting more than 5 minutes 11 98 55 43 
22 | One-way cost of 25¢ rather than 50¢ 12 88 69 19 
27 | Comfortable 13 97 75 22 
13. | One-way cost of 25¢ rather than 35¢ 14 88 67 ZI 
10 | Clean vehicle 15 92 73 19 
14 | Feel independent 16 89 58 31 
16 | Avoid walking more than a block a 97 65 32 
29 | One-way cost of 3¢ rather than 15¢ 18 84 67 17 
om Gost 19 91 62 29 
30 | Avoid unfamiliar area 20 93 78 15 
7 | Travel when traffic is light 21 73 64 9 
6 | Uncrowded vehicle 22 94 Sl 43 
19 | Package and baggage space 23 94 37 She 
31 | Pride in vehicle 24 88 60 28 
20 | New modern vehicle 25 86 68 18 
21 | Friendly people 26 91 67 24 
23 | People you like 27 93 66 27 
25 | Need not pay daily 28 82 61 21 
32 | Avoid riding with strangers 29 89 66 23 
4 | Listen to radio 30 84 44 40 
9 | Ride with people who chat 31 84 62 22 
15 | Lool at scenery 32 74 72 2 
8 | Take along family and friends 33 87 57 30 





Note.—Satisfied = summated responses in Item Categories 5, 6, and 7 (i.e., generally, very well, completely satisfied, re- 


p ectively). 


This questionnaire was designed to be self-ad- 
ministered, although the interviewer was available 
for any needed help. 


Analysis 


Analytical techniques used in both studies were 
quite similar, so the Philadelphia analysis will be 
discussed. The statistical analysis for Parts B and C 
of the questionnaire followed procedures typically 
employed for attitude data. A frequency distribution 
of responses to each item was developed and con- 
verted into percentages for each trip purpose. The 
mean and standard deviation were also computed 
for each item. 

Additionally, intercorrelations among the impor- 
tance items for each trip purpose were computed and 
factor analyzed. The factors were rotated using the 


Kaiser Varimax method.® This approach was followed 
also for the Satisfaction section of the questionnaire; 
that is, factors were derived independently for the 
following trip purpose-mode use pairs: (a) Work 
Trip—Auto, (6) Work Trip—Public Transport, (c) 
Nonwork Trip—Auto, and (d) Nonwork Trip—Pub- 
lic Transport. The factors were defined using a factor 
loading cutoff point of .30. The relative importance 
and satisfaction of the factors identified in the factor 
analysis were derived by averaging the mean re- 
sponse of each component item in the dimension 
after other weighted alternatives were examined. 


5 The authors wish to thank Emil Heerman, for- 
merly of the Department of Psychology, University of 
Maryland, and now at the University of Nebraska, 
for his assistance in the selection of a factor analysis 
method and interpretation of the factors. 
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Because hundreds of possible relationships existed 
in the demographic and trip characteristic data, a 
method was devised to sereen those relationships 
having both statistical and probable practical sig 
nificance. The method involved dichotomizing or 
trichotomizing all responses for all demographic or 
trip characteristics. This resulted in 2X3 or 3X3 
cell matrices, A decision rule was established which 
stated that only those items that had at least a 10% 
difference in response frequencies between at Teast 
two cells in a 3X3 or 2X3 condensed distribution 
when the total number of respondents is greater 
than 300 (or 20% when less than 300) would be 
considered of practical significance. The standard 
error of proportion was computed for these distribu. 
tions and such differences are beyond the .01 level 
of significance, 
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RESULTS 

Tables 1 and 2 show the relative levels of 
satisfaction by item for auto versus public 
transport for the work trip and nonwork trip, 
In addition, items are arrayed by average im- 
portance (as indicated by item importance 
means) so that importance and_ satisfaction 
levels can be determined easily for each item, 

Factor analysis of the Philadelphia results 
indicated that about 70% of the common 
variance can be accounted for by eight rela- 
tively independent factors defined by similar 
items for both trip purposes, The discussion 
that follows is organized around these factors 


TABLE 2 


PERCENTAGE OF SATISFIED Responses to KAcH Item ror Auto versus Punic Mopns 


with Trmems RANKED ny Man Imporrancn; Nonwork ‘Tri, Trip Porposm 2 











% satiafied 











ate Ape oF lata 
2 Description ne Cte with public Dillerence 
transit 
28 | Arrive without accident | 86 an 9 
18 | Safest vehicle 2 89 81 & 
33 | Avoid stopping for repairs 3 89 sl 8 
§ | Protected from weather while waiting a 95 A7 As 
17. | Avoid changing vehicle 5 06 68 28 
2 | Vehicle unaffected by weather 6 O() 81 9 
22 | One-way cost of 25¢ rather than 50¢ 7 6 65 21 
11 | Arrive at intended time & 03 66 a7 
27 | Comfortable 9 oy 69 98 
16 | Avoid walking more than a block 10 Os 57 38 
10 | Clean vehicle I Ol 63 28 
13 | One-way cost of 25¢ rather than 35¢ 12 &5 63 22 
29 | One-way cost of 3¢ rather than 15¢ 13 82 60 92 
26 =| Avoid waiting more than 5 minutes Id 03 45 48 
19 | Package and baggage space 15 05 52 43 
12 | Shortest distance 16 Ol 55 36 
3 | Cost 17 &O Sd 35 
24 | Fast as possible 18 Oo) Od 26 
14 | Feel independent 19 84 Sl 33 
6 | Uncrowded vehicle 20 O| AS 43 
30 | Avoid unfamiliar area a1 91 72 10 
7 | Travel when traffic is light 22 67 55 12 
1 | Shortest time 23 ol 55 36 
21 | Friendly people 24 92 O00) 32 
23 | People you like 25 O| 60 31 
8 | Take along family and friends 20 86 59 97 
20 | New modern vehicle 27 84 62 22 
31. | Pride in vehicle 28 89 53 36 
9 | Ride with people who chat 20 82 56 26 
15 | Look at scenery 30 78 76 2 
32 | Avoid riding with strangers $l Oo} 58 33 
25 | Need not pay daily 32 76 St 25 
4 | Listen to radio 33 79 32 47 


spectively). 


Note.—Satisfied = summated responses in Item Categories 5, 6, and 7 (Le, generally, very well, completely satiatied, ree 
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TABLE 3 


SuMMARY OF ABSOLUTE AND RELATIVE IMPORTANCE 
AND SATISFACTION OF Factors COMPARING AUTO 
AND PuBLIC TRANSPORT FOR THE WORK TRIP 








Mean fac- | Mean factor satisfaction 


: tor im- scores> 
Factor portance 
scores* | Ayto | Public | Difference 

Reliability 6.39 6.15 | 5.89 .26 
Travel time 6.14 6.23 4.99 1.24 
Weather 5.99 6.18 | 5.01 ee 
Cost 5.50 5.69 | 4.97 ia 
State of vehicle 5.13 5.95 | 5.10 85 
Unfamiliarity 4.62 Calon ones 77 
Self-esteem 4.61 5.90 | 4.49 1.40 
Diversions 4.01 5.86 | 4.90 .96 





Note.—For purposes of comparison, only those items that 
had high factor loadings and were common to both trip pur- 
poses were used to calculate the factor scores, 

a Listed in order of importance. 

b Highest possible score = 7.00, 


in approximate order of importance. Further- 
more, some of the more salient differences in 
perceived satisfaction with auto and public 
transit associated with demographic charac- 
teristics of respondents are examined. Tables 
3 and 4 summarize the importance of, and 
satisfaction with, auto and public transport 
considering each factor. 


Reliability of Destination Achievement 


An overwhelming percentage of respondents 
believed the items “arrive without accident,” 
“avoid stopping for repairs,’ and “safest 
vehicle” were very important. Examination of 
Tables 1 and 2 indicates that public trans- 
portation was rated nearly as satisfactory as 
the automobile for the reliability items (Nos. 
28, 18, 33). These findings support the con- 
tention that although reliability and safety 
are important to transport users, possibilities 
of altering use patterns through persuasion 
campaigns based on a reliability and safety 
theme are not good. 


Travel Time 

The travel time factor including “arriving 
at the intended time” in the “shortest dis- 
tance,” as “fast as possible,” and in the 
“shortest time” is second in importance for 
the work trip but less significant for the non- 
work trip. 
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Satisfaction with the auto for the work trip 
is quite high (average for items in factor = 
6.23) and public transportation moderate 
(4.99). Considering both trip purposes, the 
differences in satisfaction (1.35 and 1.24) are 
among the largest between auto and public 
transport for this important factor, and thus 
travel time is probably a key determinant in 
the choice of auto over public transport. 

Furthermore, examination of demographic 
variability indicates the respondents more 
satisfied with the auto for travel time con- 
siderations have middle to high incomes, are 
white, in lower age categories, and live in a 
single-unit dwelling. For example, Table 5 
shows the percentage satisfied with the auto 
categorized by income, age, race, and type of 
dwelling for one of the travel time items (get 
there in the shortest time). Middle- and upper- 
class relative satisfaction with auto for time 
indicates changes to public transit will be dif- 
ficult to accomplish unless the perceived time 
preference for auto is reduced or overcome. 


Weather Factor 


A weather factor follows travel time in im- 
portance and includes “vehicle unaffected by 
weather” and “protected from weather while 
waiting.” The auto is perceived to be mark- 


TABLE 4 


SUMMARY OF ABSOLUTE AND RELATIVE IMPORTANCE 
AND SATISFACTION OF FAcTORS CoMPARING AUTO 
AND PuBLIC TRANSPORT FOR THE 

Nonwork Trip 








Mean fac- | Mean factor satisfaction 
Gartor tor im- scores? 
portance 
scores* | Auto| Public | Difference 

Reliability 6.34 5.99 | 5.64 a5 
Weather 5.98 6.13 | 4.84 1.29 
Convenience 5.78 6.38 | 4.86 1.52 
Cost Hee) Den | eee 97 
Travel time 5.26 6.07 | 4.72 1.35 
State of vehicle 5.10 5.84 | 4.93 91 
Congestion 5.02 5.64 | 4.43 12 
Unfamiliarity 4.56 6.03 | 4.88 iil 
Diversions 4.45 5.88 | 4.78 1.10 
Self-esteem 4.25 5.78 | 4.20 1.58 


Note.—For purposes of comparison, only those items that 
had high factor loadings and were common to both trip purposes 
were used to calculate the factor scores. 

® Listed in order of importance. 

b Highest possible score = 7.00. 
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edly more satisfactory than public transport 
for protection while waiting, as shown in 
Tables 1 and 2, but does not have much ad- 
vantage for the vehicle unaffected by weather 
item. These differences are noteworthy be- 
cause they pertain to the so-called trade-off 
hypothesis that suggests passenger exposure 
to bad weather while waiting for service may 
be offset by the imperviousness of public trans- 
port to such weather. Apparently, the “go in 
any weather” advantage sometimes attributed 
to public transport is not a strongly held 
opinion by the respondents. Thus, it may not 
effectively counterbalance the definite dissatis- 
isfaction with public transport regarding the 
protection-from-weather-while-waiting item. 


Cost 


The cost factor, including the four items 
concerning transport cost, is rated as fourth 
in importance for both the work and the non- 
work trip. The respondents saw a distinct ad- 
vantage—low cost—in the automobile versus 
public transport mode for both the work and 
nonwork trips. 

For many consumers, the following may be 
hypothesized: (a) they have made a decision 
to have a car (based on several considera- 
tions); (0) thus, only the variable trans- 
portation costs are significant for modal choice 
decisions; and (c) they view the car to be 
equal to or less costly than public transit con- 
sidering variable costs. The latter hypothesis 
is supported by evidence from other studies, 
such as Survey Research Center data that 
showed that of individuals who felt they had 
a choice between auto and public modes 94 
believed the car to be equal to or less costly 
than the common carrier, while 75 thought the 
auto to be more expensive. 

This does not mean that cost should be 
underestimated as one of the determinants of 
consumer decision making. Smerk has re- 
ported studies where significant increases in 
traffic were found to result from fare reduc- 
tions (Smerk, 1964). As indicated in the 
present study, cost is ranked fairly high in 
importance, and respondents are more satis- 
fied with the cost of auto than with public 
transit. A somewhat analogous situation exists 
in industry with respect to employee percep- 
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TABLE 5 


PERCENTAGE SATISFIED WITH AUTO FOR ‘SHORTEST 
Time” Item ComBrninc Trip PURPOSES 








Category % satisfied 

Income 

$6,000 60 

$6,000-9,999 75 

$10,000 and over 73 
Age 

Under 35 76 

35-55 69 

Over 55 59 
Race 

White 72 

Other 60 
Dwelling 

Single unit 81 

Apartment 63 





tions of the importance of wage levels. Such 
factors as job security and opportunity for 
advancement have been consistently ranked 
higher. However, when asked what factors 
contribute most to dissatisfaction, wages are 
mentioned frequently. These results may be 
interpreted as suggesting that the present 
level of wages for most employees is high 
enough so that other factors are more crucial. 
However, if wages are dropped significantly, 
or if they are perceived to be inequitably de- 
termined, it is likely that their importance in- 
creases substantially, as suggested by the dis- 
satisfaction evidence. Assuming reasonable 
service equivalence, perhaps the same phe- 
nomenon prevails if there is a significant in- 
crease in the cost of auto operation (e.g., fees 
and tolls) or decrease in the cost of common 
carrier (e.g., subsidy). It is also suggested 
that of these two possibilities, increasing the 
cost of using autos is more effective than de- 
creasing the cost of the common carrier, since 
neither the present variable cost of driving 
nor common carrier cost is apparently con- 
sidered seriously burdensome by many travel- 
ers. Of course, other considerations might 
make it impractical to attempt such a change, 
for example, political infeasibility. 

Table 6 indicates for the low cost item the 
percentage satisfied with the auto and with 
public transit categorized by demographic 
variables. This table shows quite clearly the 
much smaller percentage satisfied with the 
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TABLE 6 


PERCENTAGE SATISFIED WITH Cost OF TRIPS 








Category Auto pee Difference 
Income 
< $6,000 54% 26% 28% 
$6,000-9,999 65% 37% 28% 
$10,000 and over] 67% 38% 29% 
Education 
<High school 63% 31% 32% 
High school 73% 29% 44% 
> High school 69% 46% 2395 
Distance from cen- 
tral business 
district 
<3 miles 58% 22% 36% 
3-5 miles 70% 43% 34% 
>5 miles 83% 45% 38% 





low cost of public transit as well as the pat- 
tern of those with lower income, lower educa- 
tion, and residing closer to the central busi- 
ness district showing relatively less satisfac- 
tion. Presumably, the less affluent group would 
be most affected by lowering the costs of 
public transit or raising the cost of auto 
travel. However, this group already has a 
relatively high use rate of public transit and 
relatively low rate for the auto compared to 
the more wealthy. Thus, cost changes might 
help in changing use patterns, but rather less 
well than might be desired. Cost or changes 
in cost may be included, however, as one of 
the elements in the transport package (in- 
cluding travel time, convenience, and weather 
protection) that determine auto favorability 
over public transit. 


State of Vehicle 


Tables 1 and 2 indicate that the items in 
this factor “clean vehicle” and “new modern 
vehicle” have moderate importance, with the 
former slightly more important. 

The automobile again emerges as more 
satisfactory to respondents than does public 
transport for the newness and the cleanliness 
of the vehicle. The average difference (.91 and 
.85) is not as large as it is for most other 
factors, suggesting possibly that changes in 
the state of the vehicle would have a relatively 
small effect on consumer choice. 
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Convenience 


Two items that ranked quite high in im- 
portance defined a “convenience” factor. They 
were “avoid changing vehicle” (6.10 and 
5.99) and “avoid waiting” (5.86 and 5.40). 
In both cases, the auto was seen as satisfac- 
tory by a substantially greater percentage of 
respondents than was public transit as indi- 
cated in Tables 1 and 2. 

Satisfaction as measured by the combined 
trip purpose percentage increased with income 
and distance of residence from the central 
business district, as illustrated by the “avoid 
changing vehicle” item in Table 7. It is also 
evident that differences in satisfaction between 
auto and public modes are consistently large 
(31-37%) in favor of the auto for all demo- 
graphic categories. Thus, it appears there is 
a tendency for higher income suburbanites to 
be more satisfied with either mode of travel 
than the low-income close to central business 
district group, perhaps simply reflecting a 
“more satisfied” general syndrome. Both 
groups obviously agree, however, that the 
auto is significantly more satisfactory than 
public modes. Improvement in convenience 
seems a high priority factor in possible at- 
tempts to divert use patterns of the two 
groups considering both importance and satis- 
faction data. 


Unfamiliarity 
Similar to the state of vehicle factor the 


items in this factor “avoid riding with 


TABLE 7 


Avoib CHANGING VEHICLE SATISFACTION 
(PERCENTAGE SATISFIED) 





Public 


Category Auto trans Difference 
Income 
< $6,000 74% 43% 31% 
$6,000-9,999 88% 57% 31% 
$10,000 and over| 88% 51% 37% 
Distance from cen- 
tral business 
district 
<3 miles 67% 31% 36% 
3-5 miles 84% 47% 37% 
>5 miles 92% 59% 33% 
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TABLE 8 


Avorp UNFAMILIAR AREA SATISFICATION 
(Percentage Satisfied) 








Category Auto cue Difference 
Income 
< $6,000 57% 37% 20% 
$6,000-9,999 76% 60% 10% 
$10,000 and over| 82% 51% 31% 
Education 
<High school 62% 44% 18% 
High school 77% 52% 25% 
> High school 78% 57% 2197 
Distance from cen- 
tral business 
district 
<3 miles 62% 44%, 18% 
3-5 miles 77% 52% 25% 
>5 miles 718% 57% 21% 


strangers” and “avoid unfamiliar area” have 
some to moderate importance (4.07 and 5.17). 

The auto allows a person to travel with 
whom he pleases and to stay in familiar ter- 
ritory. Public transportation usually does not 
provide this sort of choice. As might be ex- 
pected, the auto was rated significantly better 
in satisfying this requirement as indicated in 
Tables 1 and 2. Moreover, satisfaction with 
the unfamiliarity factor varies with income, 
education, and distance from the central busi- 
ness district. For example, Table 8 shows 
those who are less highly educated, have lower 
incomes, and live closer to the central busi- 
ness district tend to be less satisfied in being 
able to avoid unfamiliar areas in their recent 
transportation. They perhaps are more likely 
to begin or end their trips in unfamiliar areas. 
In traveling to work or to shop, they may 


tend to go outside of their own areas, per- 


haps reflecting the perception that jobs and/or 
prices are more attractive elsewhere. A per- 
son traveling in such a vicinity is likely not 
to be fully familiar with each neighborhood. 
Again, the auto apparently provides a more 
satisfactory means of avoiding or lessening 


- this problem. 


Self-Esteem 


The self-esteem factor was defined by sev- 
eral items including feeling independent, drive 
vehicle yourself, pride in vehicle, and satis- 
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faction of owning vehicle. This discussion is 
limited to the most important item, feeling 
independent. This item was rated as very im- 
portant (5.52) on the average, and about 
65% of the respondents felt it to be very im- 
portant or of greatest importance for the 
work trip (55% for nonwork trip). 

Table 9 shows the percent of respondents 
satisfied with auto and with public transit as 
it varies with income, education, and distance 
from the central business district. The largest 
differences occur for those with higher in- 
comes, more education, and who live farther 
from the central business district, which is 
interpreted as additional support for the con- 
tention that it will be very difficult to provide 
an alternative form of public transportation 
as satisfactory as the auto for the more af- 
fluent group. 


Congestion and Diversions 


The findings of the Baltimore and Phila- 
delphia study support the results of the Sur- 
vey Research Center which indicate that con- 
gestion (travel when traffic is light, un- 
crowded vehicle) and diversions (take along 
family and friends, look at scenery, listen to 
radio, ride with people who chat, friendly 
people) are relatively unimportant in choos- 
ing between auto and common carrier. 

Public transport is seen as less satisfactory 
than the auto for providing the user with op- 


TABLE 9 


FEELING INDEPENDENT SATISFACTION 
(Percentage Satisfied) 





Category Auto es Difference 
Income 
< $6,000 57% 30% 27% 
$6,000-9,999 712% 38% 34% 
$10,000 and over] 82% 35% 47% 
Education 
<High school 61% 38% 23% 
High school 74% 32% 42% 
> High school 78% 30% 48% 
Distance from cen- 
tral business 
district 
<3 miles 49% 25% 24% 
3-5 miles 11% 45% 26% 
>5 miles 81% 35% 46% 
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portunities for diversion and to avoid con- 
gestion. For example, for the work trip the 
auto received an average satisfaction level of 
6.27 (high satisfaction) for the uncrowded 
vehicle item, whereas the public transporta- 
tion alternatives were rated as only providing 
moderate satisfaction (4.38 average). How- 
ever, this large difference in satisfaction is 
counterbalanced by the relatively low im- 
portance of this factor when considering pos- 
sible implications for changing travel behavior. 


DIscUSSION 


In assessing ways of modifying consumer 
decision making and travel behavior, it is im- 
portant to measure resistance to change levels 
for various groups. One method is to examine 
the differences in average satisfaction levels 
between public transit and auto considering 
various attributes and their importance. An- 
other approach is to examine percentage dif- 
ferences between various demographically de- 
fined subgroups’ perceived satisfaction with 
public transit. The larger differences (along 
with relatively high importance) for charac- 
teristics would probably indicate a high level 
of resistance to change for the demographic 
group considered. 

Such important characteristics with large 
satisfaction differences for the Baltimore and 
Philadelphia respondents were many. Percep- 
tion of travel time, susceptibility to weather, 
avoidance of changing vehicles, avoidance of 
waiting, avoiding the unfamiliar, and perceiv- 
ing an independent feeling were more sig- 
nificant than cost, reliability of destination 
achievement, state of vehicle, congestion, and 
diversions in choosing between auto and pub- 
lic transit according to the evidence presented. 

The most satisfied group of transport con- 
sumers and perhaps the group most difficult to 
change were the middle and upper income 
suburbanites living more than 3 miles from 
the central business district. They are, of 
course, the ones who possess one or more 
autos and find it more feasible and satisfy- 
ing to use them for work and nonwork trips. 
These same respondents, though they tended 
to be fairly well satisfied with public transit 
for many characteristics, indicate strong pref- 
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erences for the auto when considering such 
factors as feeling independent, saving time, 
avoiding change of vehicles, and avoiding un- 
familiar areas. 

The affluent group presumably would be a 
key target of persuasion campaigns to use 
public transit, since they use it the least. The 
data presented herein should be useful in 
identifying the content and strategy of such 
campaigns. Although evidence in this study 
indicates a consistent preference for auto over 
public transit, it is clear that there are large 
differences in the magnitude of this preference 
when various mode attributes are considered. 
If these magnitude differences are coupled 
with significant differences in the importance 
of attributes, implications for the direction 
and likely success of future attempts to 
change mode use patterns become clearer. 
The data in this study suggest that it will 
take substantial ingenuity and resources to 
make public transit an attractive alternative 
for those predominantly using the automobile 
for their work and nonwork trips. 
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CORRELATES OF EMPLOYEE EVALUATIONS 
OF PAY INCREASES 
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Within a large, white-collar, industrial population, average perceptions of 
small, average, or large increases in salary formed a relatively constant 
function of level of current salary. The analogy to the psychophysical 
Weber/Fechner model, while explaining much of the variance in perceptions of 
salary increases was not complete. Additional variability was related to a 
series of demographic variables, with higher dollar expectations registered by 
college-educated employees versus noncollege, younger employees versus older, 
exempt employees versus nonexempts, and among nonexempts by males versus 
females. The results suggest that probable earnings potential, in addition to 
current earnings level, contributes variance to differences in perceptions of 


equitable salary increases. 


A recent review of research in the area of 
compensation (Opsahl & Dunnette, 1966) il- 
lustrates that very little is really known about 
the incentive value of money. Most of the 
data that have been published are exclusively 
psychological, usually dealing primarily with 
various aspects of satisfaction with pay, pay 
expectations, or rated importance of money 
in comparison with other job elements (e.g., 
the studies reviewed in Herzberg, Mausner, 
Peterson, & Caldwell, 1957; and Vroom, 
1964). While job attitudes with respect to pay 
may be important correlates of behavior, 
without some basis for relating attitudes to 
objective pay in terms of compensation 
dollars there is only limited direct utility from 
these studies for the more efficient adminis- 
tration of compensation. And, on the other 
hand, what “hard” data there are dealing 
with compensation in turn are often not 
tied to perceptions (e.g., Haire, Ghiselli, & 
Gordon, 1967). 

Studies that do attempt to evaluate the 
bridge between actual compensation and atti- 
tudes tend to deal with perceptual data in 
the form of general satisfaction and overall 
expectations regarding pay, evaluating these 
perceptions for individuals at differing com- 
pensation levels (e.g., Andrews & Henry, 
1963; or Lawler & Porter, 1963). The most 
prevalent interpretation emerging from these 
analyses is that satisfactions regarding pay 

1 Requests for reprints should be sent to J. R. 


Hinrichs, IBM Corporation, 112 East Post Road, 
White Plains, New York 10601. 
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are less a function of absolute current levels 
of pay than they are of motivational variables 
such as goals and expectations, or of back- 
ground factors such as sex, age, or educa- 
tion which serve as reference points for build- 
ing goals and expectations. 

Most frequently, psychological studies of 
compensation deal only tangentially, if at all, 
with the extent to which established salary 
administration policies and practices serve as 
normative factors shaping perceptions of pay. 
Past experience with pay, one would expect, 
is probably a crucial variable in shaping 
future expectations; in large measure one 
would expect that demographic variables cor- 
relate with perceptions of pay primarily as a 
result of common variance attributable to 
differences in actual pay treatment. Thus, 
it would seem desirable for research in the 
area of compensation to take a more critical 
and comprehensive look at the motivational 
and behavioral correlates of actual dollar 
amounts of pay, under the assumption that 
these data will capture the essence of the 
compensation system that shapes the indi- 
vidual’s psychological world in this critical 
area. 

An organization’s pay practices and policies 
are expressed most demonstrably as changes 
in pay. The key variable in understanding 
perceptions of pay, if we wish to deal with 
the norms of the pay system as it shapes 
these perceptions, must be incremental pay, 
or raises. And the research paradigm of clas- 
sical psychophysics dealing with perceptions 
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of incremental stimulus intensity would seem 
completely appropriate to the study of percep- 
tions of pay. 

A recent study of Zedeck and Smith (1968) 
has used the psychophysical method of limits 
to evaluate perceptions of equitable salary 
among junior executives and secretaries within 
a Midwestern academic institution. While the 
study suggests the utility of this method in 
providing an understanding of perceptions of 
pay equity, due to limitations in sample size, 
methodological problems in using the survey 
technique for data collection, and the re- 
stricted range of occupations and backgrounds 
of Ss in the study, it is impossible to draw 
any conclusions regarding perceptions of 
equitable salary increments among industrial 
employees. 

The present study is an attempt to provide 
data that bridge both the area of perceptions 
regarding pay and hard data on actual levels 
of compensation for a large and diverse in- 
dustrial population. The study focuses di- 
rectly on attitudes regarding pay increases 
and deals with two overlapping propositions: 
(a) that on the average, expectations re- 
garding compensation follow a relatively con- 
sistent pattern that is largely a function of 
an individual’s current absolute level of earn- 
ings; and (6) beyond this general patterning, 
pay expectations are further influenced by 
normative factors that are largely a function 
of an individual’s personal situation and 
background. 

The first proposition assumes that some 
form of Weber-Fechner relationship exists 
with regard to money; that is, that percep- 
tions of what would be a just noticeable 
increment in earnings is largely a constant 
function of current earnings. Or, in more 
prosaic terms, it assumes that the more money 
an individual makes, the more of an increment 
in salary it will take to stir his feelings, and 
that this increment is a relatively constant 
fraction of his current level of earnings. 

The second proposition says that while, on 
the average, just noticeable pay increments 
will reflect some relatively constant function 
of current earnings, there will be considerable 
variability around this average and that this 
variability will be associated with a number 
of factors describing the individual’s back- 
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ground and particular situation. Based upon 
the analyses by Andrews and Henry (1963), 
Klein and Maher (1966), and by Penzer 
(1969), one might expect level of education 
to be one such factor influencing an indi- 
vidual’s expectation. The analysis by Lawler 
and Porter (1963) across various levels of 
management would suggest that organiza- 
tional level would be another such variable. 
Similarly, we might expect age, sex, and 
occupation to be related significantly to 
earnings expectations. 


METHOD 


The data in this study come from an employee 
attitude survey administered to roughly 1,500 em- 
ployees in a large and geographically dispersed in- 
dustrial organization. Survey participants were in- 
volved in a diversity of white-collar occupations. 
Three-quarters of them were male and one-fourth 
female. Forty-four percent were nonexempt em- 
ployees, 38% nonmanagerial exempt employees, and 
18% were managers. All were paid on a salary basis. 
While the questionnaire consisted of over 200 items 
dealing with a broad spectrum of attitude and 
opinion areas, only a few items were used in the 
analyses reported here. 

The key item in the analysis consisted of a listing 
of hypothetical dollars-per-month increases ranging 
from $1 per month to $1,000. Instructions stated: 


Everyone will agree that a monthly increase of 
$1,000 would be an “extremely large” salary in- 
crease. At the same time, an increase of $1 per 
month would most likely be viewed as a “just 
barely noticeable increase.” Somewhere between 
these extremes people would view different dollar 
amounts as’ representing “large” or “average” or 
“small” increases in salary. 

To provide data dealing with a theory of 
“rewards,” we would like you to think about how 
you would view different salary increases. Please 
think realistically. We would like you to divide 
the following list of dollar amounts into five 
segments representing dollar increases in monthly 
salary which you would tend to view as falling 
in each of these categories: 


1. Extremely large increases in monthly salary; 
I would be flabbergasted. 
2. Large salary increases; I would be pleasantly 


surprised. 

3. Neither large nor small increases; about 
average. 

4. Small salary increases; I would be somewhat 
disappointed. 


5. Just barely noticeable salary increases; ones 
which essentially would not be viewed as an 
increase at all. 


The salary increments included in the list were $1, 
$5, $10, $15, $20, $25, $30, $40, $50, $60, $80, $100, 
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$125, $150, $175, $200, $250, $300, $400, $500, $1,000. 
The data that were recorded were the midpoints of 
the four intervals that each respondent indicated 
divided this series into the five different categories 
of dollar increases for him. 

Another key item in the analysis asked: “What 
is your current monthly salary ?” 


1. Under $400 6. $800 to $899 


2. $400 to $499 7. $900 to $1099 
3. $500 to $599 8. $1100 to $1399 
4. $600 to $699 9. $1400 to $2000 
5. $700 to $799 10. Over $2000 


Various demographic variables such as sex, age, 
education, and organizational level were also used 
in the analysis. 


RESULTS 


For each individual, the midpoints of the 
salary range that he indicated as separating 
“just barely noticeable” from “small” salary 
increases, “small” from “average,” “average” 
from “large,” and “large” from “extremely 
large” increases were recorded. Table 1 shows 
the means and standard deviations from these 
midpoints for male employees at each of the 
10 different levels of current monthly earn- 
ings. Table 2 presents comparable data for 
females. The “percentage of current salary” 
presented for each of these dividing points is 
merely the mean of the hypothetical increases 
indicated by people at each of the current 
salary levels over the midpoint of the class 
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interval for that salary level; for the top 
and bottom categories in the current monthly 
salary distribution, which were open-ended 
categories in the questionnaire, the midpoint 
of the interval was assumed for purposes of 
computing the percentages. 

As these tables suggest, the average dollars 
that were indicated as the dividing point be- 
tween “average” and “small” salary increases 
and between “small” and “just barely notice- 
able” increases, etc., generally increase in a 
monotonic fashion with total salary. The 
“Weber fraction” is relatively stable, espe- 
cially for the women, regarding the salary 
that would be perceived as “small” versus 
“just barely noticeable.” There is, however, 
a slight though consistent tendency for these 
ratios of salary increase to salary level to 
decrease as level of earnings increases, for all 
of the categories of perceived increases. This 
suggests that perhaps the pure psychophysical 
model may not hold but that perhaps other 
factors are also systematically affecting these 
perceptions. The most stability to the psycho- 
physical model occurs in the range between 
$400 and $2,000 current salary, and the data 
at either extreme of the distribution should 
probably be deemphasized because the class 
interval was not completely specified in the 
questionnaire, and the midpoint salary was 


TABLE 1 


PERCEPTIONS OF MINIMAL AND AVERAGE SALARY INCREASES AS A 
FUNCTION OF CURRENT EARNINGS FOR MALES 








“Just barely notice- 
able” (versus 
“small’’) increases 


“Small” (versus 
“average”’) increases 


“Large” (versus “very 
large’’) increases 


“Average” (versus 
“Jarge’’) increases 


% of : % of = % of 
current | X o |current| X o | current 
salary salary salary 


Current 
monthly N 
salary Z % of 
xe o | current| X o 
salary 
Under $400" 18 | $18 6 Sell $32 | 10 
$400-499 96 | $19 7 4.2 $355, old 
$500-599 107 | $20 8 StF $38 | 17 
$600-699 95 | $21 | 10 3.3 $39 | 14 
$700-799 f1Siae2o0) elt Sia $47 | 18 
$800-899 Pst S2ie| elt pe S516) 17 
$900-1099 193 | $31 | 14 3.1 $59 | 21 
$1100-1399 160 | $38 | 16 3.0 Sis 22 
$1400-2000 b735° | S00 |* 22 2.9 $91 | 29 
Over $2000» 34 | $56 | 24 2 $107 | 35 


® Midpoint assumed at $350. 
> Midpoint assumed at $2500. 
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TABLE 2 


PERCEPTIONS OF MINIMAL AND AVERAGE SALARY INCREASES AS A 
FUNCTION OF CURRENT EARNINGS FOR FEMALES 








“Small” (versus 
“average’”’) increases 


“Large” (versus “‘very 
large’’) increases 


“Average’”’ (versus 
“large’’) increases 
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“Just barely notice- 
able’ (versus 
G@uscent “small’””) increases 
monthly N 
salary % of 
XG o current AG o 
salary 
Under $4008 30 | $14 5 3.9 $28 | 13 
$400-499 113 | $16 6 SE $29 8 
$500-599 81 | $19 7 3D SSO ele 
$600-699 70 | $22 8 3.4 $40 | 13 
$700-799 35 | $26 | 10 3:5 $49 | 16 
$800-899 14 | $26 | 10 3.1 $48 | 15 
$900-1099 12>) $35, 14 oe $60 | 15 
$1100-1399 9 | $47 | 25 Oa $82 | 27 
$1400—2000 3); —]— — — — 
Over $2000® O;—]}]— — = a 








% of bs % of X % of 
current | xX o |current| X o | current 
salary salary salary 

8.0 $61 | 28 17 $159 | 126 45 


$137 | 57 30 
$145 | 71 26 
$168 
$178 | 54 24 
$161 | 38 19 


6.4 $60 | 20 13 
6.4 $68 | 27 12 
6.2 $79 | 31 12 
6.5 $95 | 27 13 
5.6 $90 | 21 11 
6.0 $104 | 28 10 $261 
6.5 $145 | 31 12 $261 | 49 21 











a Midpoint assumed at $350. 
b Midpoint assumed at $2500. 


merely assumed in computing the fraction. 
Omitting these two extreme categories, the 
percentage increase cited as dividing “just 
barely noticeable” from “small” ranges be- 
tween 4.2% and 2.9% for males, with an 
average of 3.3%; the range is between 3.1% 
and 3.7% for females with an overall average 
of 3.5%. The overall average “small” versus 
“average” increase split is 6.1% for males and 
6.3% for females. It is interesting to com- 
pare this with the average K of 6.2% for 
secretaries reported by Zedeck and Smith 
(1968, p. 345). 

While there appears to be a fair amount 
of stability in defining these percentages for 
small or for average increases, it is clear 
that there is a less stable perception of what 
represents a large or a very large increase. 
One might suspect that the norms of the 
organization with regard to salary administra- 
tion policy and treatment engender certain 
fairly stable expectations of the average, but 
that evaluations of what constitutes a large 
increase are built largely on wishful thinking, 
conjecture, and a considerable element of 
random response. 

Figures 1 and 2 further clarify these rela- 
tionships. These figures are log-log plots of 
current monthly salary versus perceptions of 
mean increases, separately for males and for 


females. The trend lines for perceived small 
and average increases are quite linear, con- 
forming to the logarithmic model as suggested 
by Haire et al. (1967), while the trends for 
large and very large increases are less regular. 
Based upon these plots and the data in the 
tables, it would seem that average perceptions 
of various salary increases in large measure 
do follow a lawful relationship to current 
levels of salary, and that these perceptions 
conform to a considerable degree to the 
Weber-Fechner type of relationship. The 
slight S cast to the curves in Figures 1 and 2 
highlights the apparently systematic devia- 
tion from the psychophysical model discussed 
above. As Tables 1 and 2 indicate, there is 
considerable variability around the average 
perceptions of increases used in computing 
these Weber fractions. The data in Tables 3 
through 6 illustrate that to a certain extent, 
various demographic factors are associated 
with this variance and account for additional 
variability beyond that which may be attrib- 
uted to current levels of salary. For each of 
the current monthly salary levels the distribu- 
tion of dollar increases that individuals within 
each level cited as separating small from 
average increases were divided as closely as 
possible into equal thirds. Tables 3-6 contrast 
individuals falling in the upper third of the 
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distribution of perceived increases within their 
respective current salary level with those fall- 
ing in the middle third and those falling 
in the lower third for various demographic 
classifications. 

Table 3 indicates that when current salary 
level is controlled, college graduates are sig- 
nificantly more apt to have a higher percep- 
tion of what is an “average” salary increase 
than are noncollege graduates, and the trend 
is such as to suggest that possibly this is more 
true of college graduates at the bachelor’s 
degree level than of individuals with gradu- 
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ate degrees. The data in Table 4 reflect dif- 
ferences in the perceptions of average salary 
increases with age and indicate an increasingly 
large fraction of respondents with relatively 
low salary expectations in successively older 
groups, while younger employees are more 
apt to report high expectations. Table 5 sug- 
gests that employees exempt from the pro- 
visions of the Fair Labor Standards Act are 
perhaps somewhat more apt to have higher 
salary increase expectations than are non- 
exempt employees. Based upon the data in 
Table 6, there are no evident differences be- 


"Large" vs. 
"Very Large" 


"Average" vs. 
"Large" 


"Small" vs. "Average" 


"Just Noticeable" vs. 
"Small" 


2000 3000 


Current Monthly Salary 


Fic. 1. Perceptions of salary increments as a function of current salary (males only). 
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Fic. 2. Perceptions of salary increments as a function of current salary (females only). 


tween males and females in relative expecta- 
tions regarding salary increases. 

The types of demographic variables investi- 
gated in Tables 3-6 are clearly not mutually 
exclusive, and a series of two-variable cross- 
tabulations such as this goes only part way 
in clarifying which are the most critical factors 
in explaining perceptions of salary increases 
or in identifying possible interactions among 
the variables. However, an analysis using 
AID—a computer program designed to iden- 


tify the optimal combinations of an array 
of independent variables explaining a selected 
dependent variable (Sonquist & Morgan, 
1964)—does adequately handle this problem. 
Figure 3 summarizes the results of this analy- 
sis and shows the “tree” of relationships that 
sequentially explains the most variance in 
the dependent variable of relative perceptions 
of average salary increases, with current 
actual salary level controlled. 

The most “critical” demographic variable— 
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TABLE 3 TABLE 5 
RELATIVE PERCEIVED “AVERAGE” INCREASE RELATIVE PERCEIVED AVERAGE INCREASE 
VERSUS EDUCATION VERSUS EMPLOYEE STATUS 
Educational level Relative perception of | Non- pa Manayer 
Relative perception of average increase exempt manager 
“average” increase None Backs Grad- ee eee eee | 
college elor’s Ln Upper one-third, within 
er earnings group 222 238 102 
: eps (34.9%) | (42.0%) | (38.5%) 
Ss 241 268 52 Middle one-third 204 164 | @ 74 
(30.9%) | (48.7%) | (41.9% (32.1%) | (28.9%) | (27.9%) 
L ; 99 : Be 
Middle one-third 252 147 39 Ee een an ie ‘ 
(32.3%) | (26.7%) | (31.5%) earnings group 210 oe 9 
Lower one-third, within (33.0%) | (29.1%) | (33.6%) 
earnings group 287 135 33 T 
otal 636 567 265 
(36.8%) | (24.6%) | (26.6%) (100%) | (400%) | (100%) 
ooo 780 250 124 Not c t i held stant. x2 = 7.27, p < .10 
(100%) (100%) (100%) ote.—Current earnings ne constant. x* = /. , . . 





Note.—Current earnings held constant. x2 = 46.79, p < .01 


that is, the one that explains the most vari- 
ance in the dependent variable—is education. 
In line with the findings of Klein and Maher 
(1966) and of Penzer (1969), college- 
educated employees have signicantly higher 
expectations regarding salary increases than 
do noncollege graduates. 

Age is the next most important variable ac- 
counting for differences in perceptions of in- 
creases. For both college and noncollege em- 
ployees, younger employees have significantly 


TABLE 4 


RELATIVE PERCEIVED “AVERAGE” INCREASE 
VERSUS AGE 





Age 
Relative perception of 
“average” increase Under 30-39 AO or 
30 older 
Upper one-third, within 
earnings group 224 252 76 
(41.2%) | (38.4%) | (29.6%) 
Middle one-third 189 183 70 
(34.7%) | (27.8%) | (27.2%) 
Lower one-third, within 
earnings group 131 222 111 
(24.1%) | (33.8%) | (43.2%) 
Total 544 657 257 
(100%) | (100%) | (100%) 


Note.—Current earnings held constant. x? = 33.39, p < .01. 


higher expectations than do older. The opti- 
mal split in explaining perceptions of salary 
increases (based on data coded in 5-yr. class 
intervals) occurs at age 25 for the noncollege 
group, and at age 30 for the college graduates, 
suggesting that perhaps the key factor is the 
individual’s number of years of exposure to 
industrial work and salary administration 
practices, rather than his chronological age 
per se. 

The sex variable further explains relative 
perceptions of salary increases only for the 
younger noncollege group, with males having 
significantly higher expectations than females. 





TABLE 6 
RELATIVE PERCEIVED “AVERAGE” INCREASE 
VERSUS SEX 
Se: 
Relative perception of ee 
“average” increase 
Male Female 
Upper one-third, within earn- 
ings group 428 135 
(38.8%) | (36.8%) 
Middle one-third 327 115 
(29.7%) | (31.3%) 
Lower one-third, within earn- 
ings group 347 117 
(31.5%) | (31.9%) 
Total 1002 367 
(100%) } (100%) 


Note.—Current earnings held constant. x2 = .713, ns. 
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Y=3: 


* y Coded as: 


Relative Perception of an "Average" Dollar Increase* 
Nn 


ings group. 
Y=2: 


Yel: 
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Females 


The individual falls in the upper 1/3 of the distribution 
of indicated increase amounts, within a given current earn- 


In middle 1/3 


In lower 1/3 of distribution of increase amounts 


Fic. 3. Demographic characteristics explaining relative perceived “average” salary increases 
(current earnings held constant). 


This finding highlights the importance of 
assessing interactions in an analysis such as 
this, as the simple cross-tabulation in Table 6 
does not reflect any differences between males 
and females. 

The demographic variable of occupational 
level—nonexempt versus nonmanagerial versus 
managers—does not contribute any significant 
unique explanation of variance in perception 
of increases, no doubt reflecting the high cor- 
relation between this variable and education 
level and age. 


Discussion 


In this presentation we have deliberately 
avoided an analysis of opinion questionnaire 
data dealing directly with satisfaction with 
salary and ratings of earnings expectations in 
preference for our analysis which deals with 
perceptions regarding actual dollar amounts 
and demographic characteristics. Hopefully, 
by maintaining a reasonably concrete referent 
in the form of actual dollars for the percep- 
tual data that are presented, we have avoided 
some of the problems of ambiguity of referent 


and response biases such as halo and acquies- 
cence that sometimes occur in opinion and. 
satisfaction data. 

The picture that emerges from this analysis 
suggests that an individual’s current level of 
earnings is probably one of the most powerful 
variables affecting how he perceives a given 
amount of incremental earnings. Certainly, as 
common sense would tell us, a fixed salary 
increase—for example, of $25 per month—has 
vastly different meaning to different individu- 
als, largely as a function of their current 
earnings level. To conclude that differences 
in satisfaction regarding pay are not related 
to differences in actual level of pay as do 
Klein and Maher (1966, p. 205) takes as a 
“siven” the fact of a system of compensation 
with differing norms regarding money depend- 
ing upon current level of earnings. (Actually, 
the current data suggest increasing satisfac- 
tion with salary with higher levels of earnings, 
though these were not presented nor are they 
necessary to the analysis here.) The current 
data would suggest that this normative struc- 
ture—which in actuality can be thought of as 
represented by the organization’s salary struc- 
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ture—is probably the most powerful variable 
influencing perceptions regarding money. 

Beyond that, however, other situational 
characteristics, largely demographic in nature, 
seem to influence employee’s perceptions of 
money within this population. The analysis 
suggests that college graduates, younger em- 
ployees, and male employees holding non- 
exempt positions tend to have relatively high 
thresholds with regard to the motivational 
potential of money. That is, one would expect 
that if current salary levels were held con- 
stant, male nonexempt employees would be 
less satisfied than females, college graduates 
would be less satisfied than noncollege gradu- 
ates, younger employees would be somewhat 
less satisfied than older, and exempt em- 
ployees overall would tend to be less satisfied 
than nonexempt. 

Within the framework of the psychophysi- 
cal model regarding the perception of salary 
increases, we perhaps may view these charac- 
teristics as analogous to sources of “constant 
error” such as occur in laboratory studies.’ 
Such “errors,” in addition to operating within 
any one salary level, could also be expected 
to demonstrate effects across levels, an effect 
useful in explaining the apparent tendency 
for the ‘Weber fractions” in Tables 1 and 2 
to decrease slightly for successively higher 
salary levels. Since salary levels themselves 
are correlated with these demographic charac- 
teristics, we could reasonably expect some- 
what larger “Weber fractions” for relatively 
low salary levels than for higher levels. 

The nature of these trends for demographic 
variables fits in with an expectancy hypothesis 
regarding earnings suggesting that those in- 
dividuals possessing personal characteristics 
ordinarily associated with relatively high 
levels of earnings potential—college educa- 
tion, youth, males (among nonexempts)—will 
tend to have higher expectations with regard 
to earnings than will others, and that these 
expectations in turn are reflected in a higher 


2The author is indebted to Allen I. Kraut for 
his contribution in pointing out his analogy. 
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threshold of what is perceived as an accept- 
able salary increase. On the other hand, 
people with lower expectations will “settle for 
less.” These trends also fit in with the concept 
of reference groups, which suggests that 
satisfactions within industry are dependent to 
at least some extent on the reference group 
that the individual uses as a norm for 
evaluating his own situation. But, as pointed 
out above, probably even a more potent 
referent for salary perceptions than an indi- 
vidual’s reference group is the very basic 
compensation issue of “where I am now.” 
By taking both factors into account, it may 
be possible to build relatively comprehensive 
models of individuals’ perceptions regarding 
equitable compensation. 
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Two hundred management students role played the “Change of Work Pro- 


” 


cedure 


case in a study designed to determine ways in which performance 


affects leader behavior. Through changes in the foreman’s roles, groups were 
assigned to a high performance, low performance, or control condition. High 
past performance was found to increase leader supportiveness, interaction 
facilitation, goal emphasis, and work facilitation behaviors, as well as member 
influence, group cohesiveness, and satisfaction. Thus, theories of leadership 
should consider performance as a cause as well as an effect of leader behavior. 


Behavioral scientists (see, e.g., Blake & 
Mouton, 1964; Likert, 1961; or McGregor, 
1960) have argued strongly that leadership 
behavior affects the performance of subordi- 
nates. Evidence for this argument has come 
from a number of correctional studies, for 
example, the work at the Institute for Social 
Research at the University of Michigan in the 
early 1950s (e.g., Katz & Kahn, 1952, 1960; 
Katz, Maccoby, Gurin, & Floor, 1951; Katz, 
Maccoby, & Morse, 1950; or Likert, 1961), 
and from a few experiments. In one experi- 
ment, Jackson (1953) found that when super- 
visors of work groups were transferred to 
other groups, the new subordinates perceived 
them in substantially the same manner as the 
original group. Apparently the supervisors 
maintained their style of leadership regard- 
less of characteristics of the group being 
supervised. In another study, Day and 
Hamblin (1964) reported that feelings of 
aggression and the productivity of under- 
graduate women in a laboratory group varied 
according to two dimensions of leadership: 
close versus general and punitive versus non- 
punitive. 

1This research was supported in part by a grant 
from the Alfred P. Sloan Research Fund. 

2The authors are grateful for the assistance of 
Eldon E. Senner and James R. Stinger in various 
phases of this research. A portion of this research 
is based upon a dissertation submitted in partial 
fulfillment of the requirements for the Master of 
Science degree by the junior author in June 1968. 

Requests for reprints should be sent to George 
F. Farris, Massachusetts Institute of Technology, 


Sloan School of Management, Room 52-590, 50 
Memorial Drive, Cambridge, Massachusetts 02139. 
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Although the findings of these experiments 
indicate that leadership can affect perform- 
ance, the possibility remains that the per- 
formance of the subordinates can also affect 
leadership. The findings of the correlational 
studies of leadership can be interpreted in 
this way. Moreover, in a recent longitudi- 
nal study, Farris (1969) found consistently 
stronger relationships between performance 
and several aspects of “leadership climate” 
when performance was measured first. 
“Leadership climate” appeared to follow per- 
formance more than performance followed 
“leadership climate.” 

A full understanding of leadership behavior 
requires that it be studied as a dependent 
as well as an independent variable. To the 
extent that performance affects leadership, 
causal interpretations of correlations between 
leadership and performance should allow for 
the possibility that leadership behavior is 
affected by performance. The present study 
examines experimentally- the effects of per- 
formance upon four aspects of leadership 
behavior suggested by Bowers and Seashore 
(1966): support, interaction facilitation, goal 
emphasis, and work facilitation. It was pre- 
dicted that each of these four leadership 
factors, which have been found to be posi- 
tively correlated with different measures of 
performance, would be caused by performance. 

A second set of predictions was concerned 
with feelings about the group and its discus- 
sion process. It was predicted that when the 
leader was told that his group was “high 
performing,’ the leader and _ subordinates 


EFFECTS OF PERFORMANCE 


would feel more satisfied with their group 
and its discussion process, more cohesive, and 
able to be more productive in the future, and 
the subordinates would feel better able to 
influence the discussion process. 


PREDICTIONS 


Hypothesis 1: Leaders told that they have 
high-producing groups will be seen by their 
subordinates as showing more “good leader- 
ship” behavior than leaders told that they 
have low-producing groups. This fundamental 
hypothesis of the present study is based upon 
the assumption that positive correlations 
found between performance and leadership in 
past studies are due in part to perform- 
ance affecting leadership behavior. Perform- 
ance is predicted to affect leadership in four 
areas: 

Hypothesis 1a: Support. When compared to 
leaders who are told they have low-producing 
groups, leaders told they have high-producing 
groups will be seen by their subordinates as 
more sensitive to subordinates’ needs and 
feelings, more apt to give recognition for good 
work, more trustful of the subordinates, less 
punitive and critical, and less apt to exert 
unreasonable pressure. 

Hypothesis 1b: Goal emphasis. When com- 
pared to leaders who are told they have low- 
producing groups, leaders told they have 
high-producing groups will be seen by their 
subordinates as more apt to let subordi- 
nates know what is expected from them, 
maintain high performance standards, stress 
group pride, and stress being ahead of the 
competition. 

Hypothesis 1c: Work facilitation. When 
compared to leaders who are told they have 
low-producing groups, leaders told they have 
high-producing groups will be seen by their 
subordinates as more apt to explain suggested 
job changes and to allow freedom in the work 
but less apt to decide in detail what shall be 
done and to impose their own preferred solu- 
tions in problem solving. 

Hypothesis 1d: Interaction facilitation. 
When compared to leaders who are told they 
have low-producing groups, leaders told they 
have high-producing groups will be seen by 
their subordinates as more apt to encourage 
speaking out, communicate clearly and ef- 
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fectively, emphasize teamwork, be open to 
influence, and be sensitive to differences 
between people. 

Hypothesis 2: Subordinates in the high per- 
formance condition will have more influence 
during the discussion and be more satisfied 
with this influence than subordinates in the 
low performance condition. No differences are 
predicted for leader influence according to 
past performance of the group. Consistent 
with Tannenbaum’s (1962) concept of an 
“expanding influence pie,” it is anticipated 
that the leaders will maintain a relatively 
high degree of influence for themselves re- 
gardless of past performance, but that the 
subordinates will be allowed more influence 
when their past performance has been rela- 
tively high. Past performance will affect 
leadership style, which in turn will affect felt 
influence. 

Hypothesis 3: Groups in the high perform- 
ance condition will be more cohesive than 
groups in the low performance condition. In 
the high performance condition the subordi- 
nates will like each other more than in the 
low performance condition, and they will be 
less apt to want to change groups or leaders. 
Moreover, the leaders in the high performance 
condition will like their subordinates better 
and be less apt to want to change groups. 
Past performance will affect leadership style 
which will in turn affect members’ attraction 
to their group. 

Hypothesis 4: In the high performance 
condition as compared to the low performance 
condition, leader and subordinates will be 
more satisfied with each other, with the dis- 
cussion, and with the solution. Subordinates 
will be more satisfied with their jobs and 
with their fellow subordinates. This greater 
satisfaction is anticipated as a consequence of 
the “better leadership,” higher total influence, 
and greater group cohesiveness that will occur 
in the high performance condition. 

Hypothesis 5: In the high performance 
condition as compared to the low performance 
condition, leader and subordinates will esti- 
mate greater efforts to achieve high perform- 
ance and greater increases in future produc- 
tion. This increase in production is anticipated 
as a consequence of the “better leadership,” 
higher total influence, greater cohesiveness, 
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and greater satisfaction that will occur in 
the high performance condition. 


METHOD 
Subjects 


Two hundred persons participated in the study 
as members of 50 four-man groups role playing 
Maier’s (Maier, Solem, & Maier, 1957) Change of 
Work Procedure case. The Ss were male graduate 
students in behavioral science courses at the Massa- 
chusetts Institute of Technology’s Sloan School of 
Management and male MIT undergraduates in 
introductory management or behavioral science 
courses. 


Task 


The Change of Work Procedure case involves a 
foreman and three workers who assemble fuel pumps 
in an automobile company. Maier and Hoffman 
(1960) describe it as follows: 


The assembly operation is divided into three posi- 
tions and the workers have adopted a system of 
hourly rotation among the three jobs. The role- 
playing consists of a meeting called by the foreman 
to discuss the possibility of their changing their 
work method to one in which each man works 
on one position only, his best position according 
to the time study data given to the foreman. 
Although theoretically the new method should 
increase the productivity of the workers and thus 
increase their piece-rate wages, the foreman’s sug- 
gestion of a change to the new method usually 
meets with considerable resistance [p. 279]. 


Boredom from working on only one position is 
an important source of worker resistance to the 
suggested change. 

The possible solutions to the case vary in quality 
and conformance to the wishes of the workers and 
the foreman: old (favored by the workers), new 
(preferred by the foreman), and integrative (an 
innovation solution combining positive aspects of 
the old and new solutions). The case has been 
used extensively for research purposes in the past 
(Hoffman, 1959; Hoffman, Harburg, & Maier, 1962; 
Maier, 1953; Maier & Hoffman, 1960, 1961; Maier & 
Solem, 1962). 


Performance Manipulation 


The 50 groups were randomly assigned to a high 
performance, low performance, or control condition 
by modifying the figures in the time-study report 
given to the foreman. In addition, the roles for the 
foremen in the high performance condition were 
modified by adding the statement: “This rate of 
125% of average makes it one of the ten highest 
producing groups out of 50 groups in the company.” 
In the low performance condition, foremen were 


3The authors are grateful to Thomas J. Allen, 
William H. Gruber, David A. Kolb, Donald G. 
Marquis, and Irwin M. Rubin for allowing their 
classes to participate in the study. 


GrorcE F. Farris AND FRANciIs G. Lim, Jr. 


told, “This rate of 75% of average makes it 1 of 
the 10 lowest producing groups out of 50 groups 
in the company.” Foremen in the control condition 
and workers in all three conditions received the 
standard role instructions (Maier et al., 1957). 
Twenty groups were assigned to the high perform- 
ance condition, 20 to the low, and 10 to the control. 


Procedure 


The multiple-role-playing procedure (Maier, 1952) 
was used to administer the case during regular class 
time. The investigator read the general instructions 
to all groups in each class and distributed the roles 
to each group member, foreman and workers being 
assigned roles randomly. After the members had 
read their roles, the groups were asked to start 
solving the problem and to come up with a solution 
in 20 min. A 2-min. warning was given at the end 
of 18 min., and all discussion ceased at the end of 
20 min. Roles were collected, and short question- 
naires were administered to the foreman and three 
workers in each group. Each questionnaire took 
about 5 min. to complete. 


Measurements 


Perceptions of the behavior of the foreman and 
data on some characteristics of the decision process 
were obtained from each worker through the ques- 
tionnaires. On his questionnaire the foreman re- 
ported the solution, perceptions of the discussion, 
and evaluations of the workers. Most items con- 
sisted of descriptive statements followed by 7-point 
scales and had been used in previous correlational 
studies of leadership and group behavior (Fleishman, 
Harris, & Burtt, 1955; Likert, 1961; Stogdill, 1965; 
various questionnaire studies of the Institute for 
Social Research, University of Michigan). They are 
summarized in Tables 1-5. 


Analysis 


Several factor analyses were performed on the 
18 leadership items using different samples of worker 
and observer data. In general these analyses sup- 
ported Bowers and Seashore’s (1966) four-factor 
theory. However, it was also possible to extract two, 
three, five, and six orthogonal factors,5 and some 


*In 24 of the groups (12 high performance and 
12 low performance), an additional student was 
randomly assigned to serve as an observer and com- 
plete a questionnaire virtually identical with that 
of the workers. Results from these untrained ob- 
servers were very similar to those of the workers 
in describing foreman behavior, but quite different 
in questions that ascribed feelings to the foreman 
and workers. For details, see Lim (1968). 

® When two factors were extracted, the first ap- 
peared to be a combination of interaction facilita- 
tion and support, while the second combined goal 
emphasis and work facilitation. When three factors 
were extracted, they appeared to be (a) interaction 
facilitation and support, (6) goal emphasis, and (c) 
work facilitation and close supervision. 
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inconsistencies were found in factor structure accord- 
ing to the particular sample examined. For example, 
the item “unreasonable pressure for better perform- 
ance” was more strongly associated with a support 
factor for one sample and a goal emphasis factor 
for another sample. Because of these inconsistencies 
it was decided to report findings for individual 
items, grouped by their content into Bowers and 
Seashore’s four factors. This grouping was carried 
out so as to be as consistent as possible with the 
results of the factor analyses that were done. 

In order to test the hypotheses, t tests were per- 
formed comparing the high and low performance 
conditions. On all but three items, mentioned below, 
the groups in the control condition scored between 
the high and low groups or not significantly different 
from them. Therefore, their data are not shown 
below. 


RESULTS 
Validation of Experimental Manipulation 


In order to determine whether the fore- 
men responded to the information in their 
roles about the group’s past performance, 
foremen were asked to indicate after the dis- 
cussion how their groups had compared to 
others in the company before the discussion. 
On a 5-point scale where 5 equals “much 
above average,” the foremen in the high 
condition scored 4.4, while those in the low 
condition scored 1.8, and the controls scored 
3.5 (p < .001). Apparently the people playing 
the role of foreman were consciously aware of 
their groups’ past performance while the dis- 
cussion was being conducted. 

Hypothesis 1: Performance affects leader- 
ship. Subordinate perceptions of leader be- 
havior are summarized in Table 1. Of 18 
items describing leader behavior, results for 
16 are in the predicted direction, and results 
for 11 items are statistically signficant at the 
05 level of confidence. Performance appar- 
ently affects a wide variety of leader behav- 
iors. Examination of the four areas of leader 
behavior shows that these general findings 
hold for all areas, but that differences between 
high and low performance appear to vary 
according to area. Assuming that the scales 
are comparable, past performance appears to 
have its greatest effects on support and its 
least effects on work facilitation, with goal 
emphasis and interaction facilitation being 
about equally susceptible to influence by past 
performance. Leaders told that their groups 
are high performing are significantly more 
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TABLE 1 


Hypotuesis 1: LEADER BEHAVIOR AS A FUNCTION 
or Past PERFORMANCE 





Mean amount 
of behavior 
characteristic 
Behavior characteristic 
Past performance 


High Low 
Support 
Sensitive to needs and feelings of 
workers oat 4.2% 
Gives recognition for a job well 
done 4.2 ZONE 
Has trust and confidence in his 
men Sl 4,2** 
Punitive or critical of group’s 
performance 1.8 364%" 
Exerts unreasonable pressure for 
better performance 2.8 Sto" 
Goal emphasis 
Lets group members know wha 
is expected of them 4.2 4.4 
Maintains high performance 
standards Sil iB LOn es 
Stresses a feeling of pride in the 
group 4.6 Sonn 
Stresses being ahead of competing 
work groups 4.6 3.9 
Work facilitation 
Gives reasons for suggested 
changes on the job a3 4.9 
Allows members freedom and 
autonomy in their work 4.8 4,1* 
Decides in detail what shall be 
done and how 2.4 2.8 
Tries to impose his preferred 
solution on the group 4.2 4.2 
Interaction facilitation 
Encourages speaking out and 
listens with respect Oo 4.9* 
Communicates clearly and 
effectively 4.8 Ast 
Emphasizes that people work 
together as a team 4.0 ouoe 
Open to influence from his 
workers 4.8 4.6 
Sensitive to differences between 
people 4.0 3.4 
5 S00 
ED < 001. 


likely than leaders told that their groups are 
low performing to be seen by their subordi- 
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nates as sensitive, giving recognition, trusting, 
nonpunitive, exerting less unreasonable pres- 
sure for performance, maintaining high per- 
formance standards, stressing a feeling of pride 
in the group, allowing freedom, encouraging 
speaking out, communicating clearly, and 
emphasizing teamwork. 

Hypothesis 2: Performance affects influ- 
ence. Table 2 shows that Hypothesis 2 was 
strongly supported. In the high condition sub- 
ordinates felt they had more influence in the 
discussion and were more satisfied with their 
influence than subordinates in the low condi- 
tion. No differences were found in leader 
influence or satisfaction with influence accord- 
ing to past performance. The leaders per- 
ceived that two of the three subordinates had 
more influence in the high performance condi- 
tion and one had more influence in the low 
performance condition. Apparently Tannen- 
baum’s (1962) notion of the expanding influ- 
ence pie holds in this study. With high past 


TABLE 2 


HivpotueEsis 2: INFLUENCE AS A FUNCTION 
oF Past PERFORMANCE 








Mean amount 
of influence 
or satisfaction 


5 with influence 
Measure of influence or 


satisfaction with influence 
Past performance 


High Low 

ep al 
Subordinate perception of own 

influence 4.6 4,2* 
Leader perception of Worker 1’s 

influence 4.2 5.4 
Leader perception of Worker 2’s 

influence 4.5 4.2 
Leader perception of Worker 3’s 

influence 4.8 4.0* 
Subordinate satisfaction with own 

influence 5 4,5** 
Leader’s perception of own 

influence 4.6 4.8 
Subordinates’ perception of 

foreman’s influence 4.1 3.8 
Leader’s satisfaction with own 

influence 4.1 4.4 
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TABLE 3 


HyporHesis 3: COHESIVENESS AS A FUNCTION 
oF Past PERFORMANCE 





Mean agreement 
with statement 


iG 
pine Past performance 











High Low 
By subordinates: 
T like the workers in my group. 6.2 5.9 
Tf same work with different 
group, I’d move. 1.9 2.0 
If same work under different 
foreman, I’d move. AS) een 
By leaders: 
I like the men with whom I work. ne 4.6 
If supervise different group, 
same work, I’d move. De 4.0** 
aD <.05,, 
re , < .001. 


performance, subordinates’ influence increased 
while the leader’s influence remained constant. 
Hypothesis 3: Performance affects cohesive- 
ness. Table 3 shows that in the high perform- 
ance condition subordinates liked each other 
better and wanted less to change the foremen 
than subordinates in the low performance 
condition. In neither condition were subordi- 
nates very disposed toward working with a 
different group of colleagues. Leaders in the 
high performance condition tended to like 
their subordinates more and were much less 
prone to change work groups. Apparently past 
performance affects attraction to a group, and 
especially leader-member attraction. Probably 
this effect of performance on cohesiveness 
occurs through its effect on leader behavior, 
which in turn affects cohesiveness. 
Hypothesis 4: Performance affects satisfac- 
tion. Table 4 shows that subordinates in the 
high performance condition were significantly 
more satisfied with their fellow workers, their 
foreman, the discussion, and the solution than 
subordinates in the low performance condi- 
tion. Subordinates in the high performance 
condition also tended to be more satisfied 
with their jobs. The leader was significantly 
more satisfied with his work group and tended 
to be more satisfied with the discussion and 
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solution in the high performance condition. 
Apparently past performance affects satisfac- 
tion, probably through its influence upon 
leader behavior. 

Hypothesis 5: Performance affects future 
production. Table 5 shows that in the high 
performance condition, both the leaders and 
subordinates saw their groups as_ trying 
harder to achieve high performance than in 
the low performance condition, and the lead- 
ers in the high condition thought that their 
groups would maintain a higher standing in 
overall company performance. However, no 
significant differences were found according to 
experimental condition in changes anticipated 
in future production. These findings lead one 
to suspect that the differences obtained were 
due largely to the initial “set” about group 
performance created by the experimental in- 
structions rather than to the discussion proc- 
ess itself. Had the discussion process affected 
feelings about future production, differences 
would have occurred according to experimen- 
tal condition in anticipated changes in future 
production as well as in the relative standing 
of the groups in the company. 

This interpretation is supported by a 
tabulation of solutions to the case produced 
by the groups in each experimental condition. 
The high performance groups produced 7 high 


TABLE 4 


Hypotuesis 4: SASTIFACTION AS A FUNCTION 
oF Past PERFORMANCE 








Mean amount of satisfaction 


Satisfaction with Past performance 





High Low 
By subordinates: 
Fellow workers 6.1 5.6** 
Foreman bes 4.6* 
Job 5,5 Sei 
Discussion 4.6 3.8* 
Solution 5.5 A oa 
By leader: 
Work group 3S 3.507 
Discussion Del 4.4 
Solution 52 4.7 
* .05. 
+5 é 01. 


RED < O01. 


495 


TABLE 5 


Hyporuesis 5: FuTuRE PRODUCTION AS A 
FUNCTION OF PAST PERFORMANCE 








Mean estimate 


Estimate Past performance 





High Low 
By subordinates: 
Group tries hard to achieve 
high performance 5.3 4.6* 
Changes in individual production 37D 3D 
Changes in future production 
of group Sd, 3.7 
By leader: 
Group tries hard to achieve 
high performance 4.5 DES 
Future performance standing 
of group in company 4.6 Die 
Changes in future production 
of group 4.0 3.9 
*p <.0 
ep < 001 


quality integrative solutions and 13 lower 
quality old and new solutions. The low per- 
formance groups produced 10 integrative solu- 
tions and 10 lower quality solutions. Thus, 
no significant differences occurred in solution 
quality (and therefore probable future per- 
formance) according to past performance, 
and the tendency was for low past perform- 
ance to be associated with a higher quality 
solution.® 


Control Groups 


In all but three instances the groups that 
received the standard instructions scored be-. 
tween the groups in the high and low condi- 
tions or not significantly different from them. 
These findings add support to the validity of 
the experimental manipulations. In both ex- 
perimental conditions the leader placed more 
stress on being ahead of the competition than 
in the control condition, as one would expect. 
However, in the control condition the leader 
was more likely to decide in detail about 

6 This tendency reached statistical significance for 
the first 24 groups who participated in this study 
(see Lim, 1968), but was reversed for the last 16 


groups. The authors are currently attempting to 
determine reasons for these differences. 
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work activities, and the subordinates estimated 
that their future production would change 
less favorably than in either experimental 
condition. These differences are not readily 
explainable and may have been due to chance. 


A Crude Reexamination of Fiedler’s Theories 


An important aspect of Fiedler’s (1965) 
theories of leadership is the ability of the 
effective leader to be sensitive to differences 
between people. Two types of information were 
available in this study that allowed a crude 
test of whether this leadership trait is caused 
by past group performance. First a compari- 
son was made between the high and low per- 
formance conditions in the leader’s being 
“sensitive to differences between people” (see 
Table 1). Leaders who were told that they 
had high-performing groups were seen by 
their subordinates as more sensitive to dif- 
ferences between people, but this difference 
did not quite reach the .05 level of significance 
(P=06): 

Several items in the leader’s questionnaire 
asked him to rate his three subordinates on 
7-point scales on four characteristics: being an 
idea man, being a trouble maker, having in- 
fluence in the discussion, and promotability. 
A tabulation was made of differences each 
leader saw between his subordinates on each 
of these scales. A comparison of leaders in the 
high and low performance conditions showed 
no differences on the average in the extent to 
which they saw differences between their men. 
A tendency occurred in only one instance for 
past performance to affect the leader’s sensi- 
tivity of differences between his men. Leaders 
in the low performance condition saw greater 
differences between their men as trouble 
makers than did leaders in the high perform- 
ance condition. Taken together these two 
analyses suggest, but certainly do not demon- 
strate, that a leader’s sensitivity to differ- 
ences between people may be in part due to 
the past performance of his subordinates as 
a group. 


Comparison with Day and Hamblin’s 
Findings 

Day and Hamblin (1964) found differences 
in group productivity as a consequence of two 
dimensions of leadership that they varied ex- 
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perimentally: closeness and punitiveness. In 
the present study measurements were made 
of several characteristics of leader behavior 
dealing with closeness of supervision (e.g., 
unreasonable pressure, decides in detail, im- 
poses own solution, allows freedom, encour- 
ages speaking out) and punitiveness (is 
punitive, sensitive to needs and feelings, gives 
recognition for good work, stresses group 
pride). Our findings indicate that perform- 
ance affects leader behavior on these dimen- 
sions of closeness and punitiveness (see 
Table 1). Together with those of Day and 
Hamblin they show that performance both 
causes and is caused by these characteristics 
of leadership. 


DISCUSSION 


The findings of this study show that past 
performance affects most aspects of leader 
behavior, especially his support, interaction 
facilitation, and goal emphasis. Moreover, 
high past performance and the resulting leader 
behavior are associated with greater subordi- 
nate influence in decision making, greater 
group cohesiveness, and higher satisfaction. 
No clear relationships were found between 
past performance, associated leader behavior, 
and estimates of subsequent changes in group 
performance. 

Unfortunately, the design of this study 
does not allow us to determine precisely the 
processes through which past performance af- 
fects leadership. Leaders given special instruc- 
tions in their roles about their groups’ past 
performance were seen by their subordinates 
as behaving differently according to this past 
performance. These differences in subordinate 
perception may have been due to actual dif- 
ferences in leader behavior. On the other 
hand, subordinates may have learned from 
their leader that they were low or high per- 
forming, attributed this past performance to 
his leadership capabilities, and perceived his 
leadership behavior during the discussion in 
terms of a negative or positive “halo.” Re- 
ports from observers of 24 groups (data pre- 
sented in Lim, 1968), which agree substan- 
tially with subordinate descriptions of leader 
behavior, tend to support the first interpreta- 
tion. Past performance as seen by the leader 
clearly affects subordinate perception of 
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leadership and probably actual leader behavior 
as well. 

Although these findings are based on a 
laboratory experiment involving role playing, 
two factors suggest that they may be general- 
ized to “real world” leadership situations. 
First, the particular case employed was de- 
signed to simulate a real situation and has 
been used extensively in previous research. 
Second, the results of this study are consistent 
with those of a recent longitudinal field study 
by Farris (1969) who found stronger rela- 
tionships between performance and organiza- 
tional factors when performance was measured 
first. 

To the extent that these findings can be 
generalized, they indicate that we should 
extend our theories of leadership and leader- 
ship training practices to account for ways 
in which leader behavior can (and perhaps 
should) occur as a consequence of past per- 
formance. Moreover, we should be especially 
careful in interpreting single-point-in-time 
correlations between leadership and perform- 
ance as indicating that leadership causes 
performance. Clearly the causal direction can 
be the other way as well. 
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SKIMMING LISTS OF FOOD INGREDIENTS PRINTED 


IN DIFFERENT BRIGHTNESS CONTRASTS* 


E. C. POULTON 2 


Applied Psychology Research Unit, Cambridge, England 


Seventy-six people aged 29-72 yr. searched for particular words in lists of 
ingredients printed on white paper, reflectance 85%, in ink with densities of 
1.3, 4, .2, and .1 (reflectances 4, 34, 53, and 68%). The print was 6-pt. lower- 
case Univers with .6-pt. leading. There were four sets each of 15 lists. A 4 X 4 
factorial design was used that confounded list difficulty with order. In separate 
experiments the lighting was 40 and 2 ftc. There were large drops (p< .01) 
in the rate of locating ingredients when the density of the ink decreased from 
4 to .1. Increasing the density from 4 to 1.3 had no reliable effect (p > .05). 
Two people failed to locate any ingredients in the poor light when the density 
of the ink was .1. It was concluded that ingredients printed in 6-pt. lower-case 
Univers on white paper should have an ink density of at least .4. The contrast 
ratio between the ink and the paper is then at least 60%; the relative brightness 


ratio is at least 2.5: 1. 


Designers of packages sometimes use pale 
letters on pale backgrounds in order to por- 
tray softness or femininity. For other pack- 
ages they may use dark letters on dark back- 
grounds. Some of the recent covers of the 
Journal of Applied Psychology and of Human 
Factors fall into this category. It is clear 
from the threshold determinations of Cobb 
and Moss (1928) that poor contrast between 
letters and background makes discrimination 
more difficult. Either the critical details need 
to be larger, or else more light is required. 

The present experiment is the second in a 
series. In the previous article (Poulton, 1969) 
it was concluded that lists of food ingredients 
should not be printed in Univers smaller than 
6 pt. The present experiment used 6-pt. 
Univers printed on white paper with ink of 
various densities, The aim was to determine 
the lowest density of ink that did not ap- 
preciably retard the housewife when she 
skimmed lists of food ingredients. The same 
two levels of lighting were used as in the 


1 This research was carried out at the suggestion 
of The Metal Box Co. Ltd., which supplied the 
printed materials. The British Food Manufacturers 
Federation kindly defrayed the cost of the Ss. The 
author is grateful to I. Harris of The Metal Box Co. 
Ltd. for his help and encouragement. Financial sup- 
port from the British Medical Research Council is 
also gratefully acknowledged. 

2 Requests for reprints should be sent to E. C. 
Poulton, Medical Research Council, Applied Psychol- 
ogy Research Unit, 15 Chaucer Road, Cambridge, 
England. 
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previous experiment. They are intended to 
represent, respectively, the lighting in a 
supermarket and in a domestic kitchen cup- 
board. 

Tinker and Paterson (1931) compared 
various colors of print and background. They 
used their speed of reading method. They 
concluded that speed depended upon the 
brightness contrast between the print and the 
background. Unfortunately they did not 
specify the brightness contrasts that they 
used. 

Other methods have been used to compare 
print and backgrounds with various bright- 
ness contrasts and color contrasts. They are 
not appropriate to the problems of the house- 
wife who is skimming a list of food ingredi- 
ents printed on a package that she has picked 
up from a shelf. One method is to compare 
the distances at which the print can just be 
read. Another method is to compare the 
amount of print that can be read in a brief 
glance, using a tachistoscope. The results have 
been summarized by Tinker (1963, pp. 137— 
148). 

Williams (1967) recently used a related 
method. Four 3-digit numbers were presented 
at a time. The digits were 9 pt. (3 mm. tall). 
Eighteen men had to search for a particular 
3-digit number, and indicate its position as 
quickly as possible by pressing one of four 
buttons. Reaction time was the chief measure 
of visibility. Williams varied the brightness 
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contrast and the lighting, but he did not 
reduce the contrast ratio below 50%. Also 
he varied the brightness contrast by changing 
the reflectance of the background. This 
changes the amount of light reflected by the 
display, and is equivalent to varying the 
lighting (Cobb & Moss, 1928). The effects 
of brightness contrast are therefore con- 
founded with the effects of the amount of light 
reflected into the eyes. 

There were no reliable effects on reaction 
time with an illumination of 6 ftc. With an 
illumination of .6 ftc. there was a reliable 
gap of over 1.0 sec. between the worst condi- 
tion and any of the other conditions. The 
worst condition was black numbers on a 
background with a reflectance of 8%. This 
gives a contrast ratio of 8 — 4/8 or 50%. 
After the gap the next worst condition was 
black numbers on a background with a re- 
flectance of 16%. This gives a contrast ratio 
of 16—4/16 or 75%, and about twice as 
much reflected light. Black numbers on a 
white background had the shortest reaction 
times. This gives a contrast ratio of 83 — 
4/83 or 95%, and about 10 times as much 
reflected light as the worst condition. The 
remaining six conditions all fall in the gap 
of .5 sec. between these two. Unfortunately 
the results are due partly to differences in the 
amount of reflected light. They are deter- 
mined only partly by differences in the 
brightness contrast. 


MetuHop 
Materials 


The same lists of ingredients were used as in the 
previous experiments (Poulton, 1969). All four sets 
were reproduced photographically in 6-pt. Univers 
with .6-pt. leading between lines. They were printed 
four times in inks with densities of 1.3, .4, .2, and .1 
on white paper. The paper was Spartocote, It had 
a reflectance of about 85% and a slight shine. The 
reflectances of the inks were 4, 34, 53, and 68%, 
respectively. 


Experimental Design and Subjects 


Two experiments were run with different levels 
of lighting. A total of 36 Ss worked with about 40 
ftc. on the printed pages. Another 40 Ss worked 
with about 2 ftc. In each experiment the four 
densities of ink and four sets were arranged in a 
Latin-square design with four groups of Ss. The 
lists were presented always in the same order, as in 
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in 25 sec, 





Mean no.of ingredients found 


Tan A ae OMe O LO 12 
Density of lettering 


Fic. 1. Average number of ingredients found in 
25 sec. with various densities of ink. (Unfilled points: 
about 40 ftc. on table top; data from 36 Ss. Filled 
points: about 2 ftc. on table top; data from a 
separate group of 40 Ss.) 


the previous experiments. The Ss were randomly 
allocated to groups. 

The 76 Ss were members of a panel maintained 
at the Applied Psychology Research Unit at Cam- 
bridge, England. In the experiment with the bright 
light the ages ranged from 29 to 72 yr. One-third 
were men; the remainder were women. In the 
experiment with the dim light the ages ranged from 
29 to 58 yr. There was only one man, About two- 
thirds wore glasses for reading. Two others said that 
they should have done so. They were paid $.90 (7 
shillings and 6 pence) per hour for their services, 
plus traveling expenses. 


Procedure 


The procedure followed that of the previous ex- 
periments. The Ss were tested in groups, seated at 
tables. There were two practices to show Ss what 
to do. The lists used in the second practice were 
printed in inks of the same density as the first test 
lists. The S had to read a target word on a question 
sheet, find the word in the corresponding list of 
ingredients, and cross it out. There were 15 target 
words, 1 in each of 15 lists. As many target words 
as possible had to be crossed out in the 25 sec. 
available. The practices and the four parts of the 
experiment together took about 15 min. 


RESULTS AND DISCUSSION 


Figure 1 shows the average number of in- 
gredients found in 25 sec. The data have been 
pooled over all Ss. Analysis of variance fol- 
lowed by Tukey’s range test (Ryan, 1959, 
Appendix) was carried out separately for each 
level of lighting. In the bright light, ink 
densities of .4 and 1.3 both produced reliably 
quicker work than ink densities of either .1 
or .2 (p< .05 or better). 
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In the dim light, the ink density of .2 
produced reliably quicker work than the ink 
density of .1 (p< .05). The ink density of 
1.3 produced reliably quicker work still 
(p < .05). Two housewives failed to locate 
any ingredients printed with ink densities of 
.1. Both were over 40 yr. old. 

The results in Figure 1 suggest that for the 
6-pt. Univers used, the ink density should not 
be less than .4 when printing on white paper. 
With densities less than .4 there is a sharp 
fall in visibility as measured by the rate of 
locating ingredients. Densities greater than .4 
make little difference to the rate. An ink 
density of .4 corresponds to a contrast ratio 
between the lettering and the white paper of 
85 — 34/85 or 60%. The relative brightness 
ratio is 85:34, or 2.5:1. 

The results are not really comparable to 
the results of Williams (1967) that were 
referred to in the introduction. Williams found 
no reliable differences with an illumination 
of 6 ftc. This lies between the illumination 
levels of about 40 and 2 ftc. used here, both 
of which gave reliable differences when the 
brightness contrast was varied. Figure 1 shows 
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that the rate of work did not fall appreciably 
until the density of the lettering had been 
reduced to .2. This gives a contrast ratio of 
only 85 — 53/85 or 38%. Williams did not 
include any contrast ratios as low as this in 
his experiment. Thus the lighting had to be 
reduced to .6 ftc. even with dark back- 
grounds in order to produce reliable differ- 
ences between his experimental conditions. 
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EFFECT OF INTERVIEWS ON TEACHER 
SELECTION DECISIONS * 
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An experiment was conducted in a simulated situation in which 144 elementary 
school principals made decisions regarding fictitious applicants for a hypothetical 
position. The purpose of the experiment was to determine how interview 
information affects (a) time required to make decisions, (b) feeling of cer- 
tainty regarding decisions, (c) fineness of discriminations made, and (d) con- 
sistency of decisions. The levels of the interview information were (a) audio- 
visual (via color films), (b) audio (sound track of the films), and (c) no 
interview information. Interviews were scripted and role played to control 
content. The results indicated that interview information increased discrimina- 
tion and time but had no effect on consistency. More certainty resulted when 
both audio and visual stimuli were used; otherwise, audio interview information 


had the same effect as audiovisual information. 


Interviews have been used almost univer- 
sally as a part of the teacher selection process 
for many years. Although widespread use of 
the interview indicates a certain belief in its 
utility, the actual value of the interview can 
be ascertained in only three ways. 

First, the value of the interview can be 
determined by the extent to which it helps 
to predict which teachers will be most suc- 
cessful. The predictive validity of the inter- 
view has been the basis of much research. 
Interviews are not generally predictive; that 
is, they are not generally valid. Rather,. their 
validity must be determined in a given situa- 
tion, for particular positions, and following 
specified procedures. Because of the situa- 
tional aspect of establishing the predictive 
validity of interviews, the question of the 
“goodness” of a decision in terms of whether 
the “correct”? teacher was selected was inten- 
tionally omitted from this study. It is assumed 
that local school systems define teacher ef- 
fectiveness according to specified local cri- 
teria; if so, the local system will be able to 
specify the outcomes desired in terms of 
teacher behavior.* 

1This study was partially supported by a grant 
from the United States Office of Education, Grant 
No. OEC 4-7-061349-0266, Variables Affecting De- 
cision Making in the Selection of Teachers. Director: 
Dale L. Bolton, August 1968. 

2 Requests for reprints should be sent to Dale L. 
Bolton, College of Education, University of Wash- 
ington, Seattle, Washington 98105. 


8 Also, it is assumed that the decision maker who 
can discriminate consistently among teacher appli- 
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Second, an indication of the value of the 
interview may be obtained by determining 
how the interview contributes to factors neces- 
sary for successful prediction. For example, 
unless measures obtained in the interview are 
both discriminating and consistent, predictions 
made from these measures will be meaningless 
or spurious. A nondiscriminating measure ap- 
proaches a constant; therefore, any correla- 
tion with the measure will approach zero. A 
nonconsistent measure is likely to yield a high 
correlation one time and a low or negative 
correlation another time. Therefore, discrimi- 
nation and consistency are necessary condi- 
tions for predictive validity, but they are not 
sufficient. The study reported here answers 
the question of whether interviews increase 
discrimination and consistency in selection 
decisions, but the limits of the study do not 
allow conclusions about predictive validity. 

This study extends the work of others who 
have examined the interview’s utility along 
dimensions other than predictive validity. 
Sydiaha (1961, 1962) studied the effect of 
an interviewer’s empathy on the accept-reject 
decision. Springbett (1958) and Bolster and 
Springbett (1961) studied the effect that 
appearance of the applicant and order of 
presentation of interview information made 
on decisions. Levine and McGuire (1968) 


cants in a simulated situation can assess teacher 
applicants consistently according to selection criteria 
specified by a local school district. This assumption 
should be verified empirically. 
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also studied the order effect of interview 
information, noting that early cues tend to 
distort medical diagnoses. Barrett (1958) 
found that psychologists who were required 
to write evaluative reports on patients tended 
to weight interview information heavier than 
objective data. 

In Giedt’s (1955) study of the accuracy 
of personality judgments by clinical psycholo- 
gists, Ss were presented information about 
patients via visual, verbal, verbal plus sound, 
or audiovisual means. In this setting, Giedt 
concluded that (a) evaluations and predic- 
tions are separate unrelated tasks of the de- 
cision process and that cues affect the two 
tasks differentially; (5) as long as content is 
available, no differences occur in diagnosis; 
and (c) interviewers should be careful to 
avoid being misled by the patient’s appear- 
ance and expressive cues. Giedt’s study sug- 
gested the need to measure predictions or 
estimations of consequences as well as relative 
evaluations of applicants. Likewise, Giedt 
suggested the need to manipulate audio and 
visual cues rather than study the interview 
process in its natural setting as Sydiaha did 
in his 1961 study. 

The third indication of the value of the 
interview may be obtained by determining 
whether it contributes to the efficiency of the 
selection process. Two clues to efficiency are 
the amount of time needed to make decisions 
and the confidence or certainty with which 
the decision maker regards his decisions. The 
importance of the time factor seems obvious 
in that a small amount of time saved on 
each of a large number of teacher selection 
decisions means a considerable saving in time 
and money to the school district. The signifi- 
cance of the certainty factor is based on the 
view that decisiveness in an administrator is 
a good quality, that uncertainty can lead to 
indecision, and that indecision can cause vacil- 
lation and wasted motion. This study answers 
the question of how interview information af- 
fects time and certainty in teacher selection 
decisions. 

The value of the teacher selection interview 
was investigated by determining the contribu- 
tion of interview information to (a) discrimi- 
nation and consistency of decisions (two 
factors necessary to predictive validity) and 
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(5) time needed to make decisions and feeling 
of certainty regarding the decisions (two 
factors related to the efficiency of the selection 
process). The study was designed so that the 
effects of the audio and visual stimuli of the 
interview could be determined on these four 
dependent variables. 


PROCEDURE 


The Ss of the experiment were 144 elementary 
school principals, many of whom were involved in 
the selection process on a seasonal basis. The Ss 
were randomly divided into three equal-sized 
groups* for receiving interview information. One 
group received audiovisual information, another group 
received audio information only, and a third group 
did not receive interview information. In order to 
control the presentation of the interviews for the 
experiment, it was not possible to use “live” inter- 
views in which Ss actually interviewed applicants.5 
Consequently, interviews were scripted and role 
played to control content and filmed to control the 
presentation. The sound and color film of each ap- 
plicant was used for audiovisual treatment, while 
only the tape-recorded sound track of this film was 
used for the audio treatment. The camera focused 
on the applicant throughout the entire interview, and 
the interviewer remained anonymous. 

The Ss were oriented to a simulated teacher 
selection situation in which they considered eight 
fictitious applicants for a well-described, hypothetical, 
fourth grade teaching position. All Ss were provided 
written documents on each applicant. The documents 
were similar to those used by personnel directors, 
including information from letters of application, 
credentials, and recommendations, 

The general experimental task performed by each $ 
was to make decisions about the appropriateness of 
each applicant for the position on the basis of the 
information available. Each S was asked to (a) 
estimate how each applicant would be evaluated on a 
Teacher Evaluation Instrument (TEI) at the end of 
1 yr. of teaching, (b) rank the eight applicants ac- 
cording to their suitability for the hypothetical situa- 
tion, and (c) indicate the degree of certainty of 
his judgments regarding the estimates on the TEI 
and ranking by indicating how willing he would be 
to bet that his judgments were correct. 

The above tasks were completed during the morn- 
ing session of the experiment. For purposes of 
measuring the consistency of the decisions, a retest 
was administered in the afternoon in the following 
manner. Five of the eight applicants presented in 
the first session were repeated in the afternoon 


*Three other variables that were manipulated in 
the study are not discussed in this article. They were 
number of documents, masking of information, and 
instructions regarding the processing of information. 

5In effect, the interviewing skill of S was not 
included in the experiment—his use of information 
from the interview was measured, however. 
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session, These five applicants were made to appear 
different by modifying certain minor data in their 
records, for example, age, birthplace, height, and 
weight. Changes in make-up, hairpieces, and clothes 
altered appearances during the filmed interview. The 
other three applicants used during the morning 
session were decoys and were replaced by consider- 
ably different applicants in the afternoon session. The 
decoys appeared late in the order of presentation in 
the first session and early in the second session to 
aid in forming the impression that the second set 
was an entirely new set of applicants. It was assumed 
that the insertion of the decoys did not affect de- 
cisions about the five applicants on whom repeated 
measures were taken. 

When these tasks had been completed, it was 
possible to measure the four dependent variables in 
the following way: 

1. Time was measured directly by the number of 
minutes required to complete the total task. 

2. Certainty was measured directly according to 
S’s willingness to bet that his judgments were correct. 
Two expressions were obtained (regarding judgments 
on the TEI and the ranking), and separate analyses 
were made of each. 

3, Discrimination was determined by computing 
the variance of the 16 applicant scores on each item 
of the TEI; the mean variance of all items was then 
used as a measure of discrimination. The greater the 
variance, the more discriminating the individual; the 
smaller the variance, the less discriminating. 

4. Two measures of consistency were obtained: 
(a) a correlation between the first and second rank- 
ing of the five “real” applicants, that is, omitting 
the three decoys from the morning and afternoon ses- 
sions; and (b) a correlation between the first and 
second estimates of how each of the five applicants 
would be evaluated on the TEI. The correlations were 
transformed by Fisher’s r to z transformation and the 
z scores used in separate analyses. 


Development of the Simulated Interviews 


The development of the filmed interviews was 
subject to a number of constraints. First, because 
of time and fiscal limitations, the length of each 
interview was limited to 9 min. This made it neces- 
sary to display only the probing portion of the 
interview, which was considered to be the most vital 
part. Second, all applicants for the position were 
assumed to be at least minimally qualified for em- 
ployment, and all had presumably passed an initial 
screening interview although none of this initial 
interview information was available to Ss. Third, 
as a contro] measure, all applicants were female, all 
were between 22 and 28 yr. old, and all were of 
acceptable appearance; that is, none was at either 
extreme in terms of physical appearance. The con- 
trols provided a group of applicants that were rela- 
tively homogeneous with respect to these classifica- 
tion variables; relative homogeneity was necessary 
for testing discrimination among treatments. 

The problem confronted in preparing these ma- 
terials was how to display the personality character- 
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istics of the applicants in a 9-min. segment of the 
interview so as to permit them to be assessed and 
rated by Ss. Five factors were delineated by 
Ryans (1960) as significant in describing teacher 
behavior for the elementary teacher group in his 
study. These factors (described in detail in Ryans’ 
work but identified here as buoyancy, empathy, 
organization, originality, and sociability) were ma- 
nipulated among the 16 fictitious applicants in such 
a way that a personality was “created” in which 
each applicant was obviously high on one factor 
(e.g., originality) ; two other factors were less obvi- 
ous, but present in the interview behavior (e.g., 
buoyancy and sociability); and two other factors 
were not evident (e.g., organization and empathy). 

Once the personalities of the applicants had been 
created, it was necessary to prepare scripts that 
would display these characteristics in natural re- 
sponses during the filmed interview. Analytical and 
probing questions that suggested an extended answer 
and that might reasonably be asked in an interview 
were devised. Scripts were then written for each 
applicant in order to control the time element and 
the predetermined characteristics of the applicants. 

The individuals used to portray the fictitious ap- 
plicants in the filmed interviews were selected from 
senior University of Washington education majors. 
The actresses were selected on the basis of the extent 
to which they “fit” one of the fictitious applicants 
whose traits they would display on the filmed inter- 
views. The scripts were memorized and rehearsed 
until a natural aura pervaded the interview. Infor- 
mation in the documents given to Ss was comple- 
mentary to the interview information in that it 
portrayed the same characteristics. 

The rationale for the design of the interview was 
to display specific behavioral and personality factors 
that could be assessed by Ss. Because the study 
was designed to determine Ss’ certainty of assessment 
of the applicants and their ability to discriminate. 
among the applicants, the structure of each inter- 
view was developed so as to control stringently the 
characteristics displayed by the applicant in the 
interview, while retaining the realism and spontaneity 
of the situation as far as possible. 

Subsequent use of the filmed interviews in the 
research project and related presentations obtained | 
an overwhelmingly favorable response to their real- 
ism, As was expected, however, a generally negative 
response was elicited concerning the narrow focus 
of the interview segment; some reactors thought it 
failed to display enough of the “total personality” 
of the applicant, despite the fact that all of the major 
dimensions to be assessed in the experiment were 
evident in the interview and repeated in related 
documents. 


RESULTS 


Separate analyses of variance were made 
for the effect of the interview information on 
time, discrimination, certainty, and consist- 
ency. The analyses indicated that there were 
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TABLE 1 


EFFECT OF VARIOUS LEVELS OF INTERVIEW INFORMATION ON TIME, DISCRIMINATION, 
CERTAINTY, AND CONSISTENCY 





Level of interview information® 


Dependent 
measure Audio- . 
visual Audio 
Time 274.1 283.8 
Discrimination 1.150 1.188 # 
Certainty 
On TEI 9.625 8.646 
On ranking 9.917 8.500 
Consistency 
On TEI 371 369 
On ranking 549 .628 


MS, MS Fb 
None 
219.1 58392.1 1317.8 44, 3** 
925 .969 .190 510% 
8.958 12.007 2.887 4.16* 
9.042 24.528 4.074 6.02** 
428 055 059 -— 
750 .489 359 1.36 











Note.—TEI = Teacher Evaluation Instrument. 
a Entries are mean scores. 


effects on time, discrimination, and certainty; 
but whether an S received no interview in- 
formation, audio interview information, or 
audiovisual interview information had no ef- 
fect on the consistency with which he ranked 
the applicants or estimated consequences on 
the Teacher Evaluation Instrument (TEI). 
These results are shown in Table 1. 

Interview information affected the time re- 
quired to make decisions in the following way. 
Audiovisual and audio information were not 
significantly different,° but both required more 
‘time to reach a decision than no interview 
information. This was a natural consequence 
of the time needed to obtain the information 
from the interview and was anticipated as a 
result of the differences in treatment. 

With relation to discrimination on esti- 
mating ratings on the TEI, the results of 
audiovisual and audio interview information 
were not significantly different, but in each 
case the results were more discriminating than 
no interview information. This result indicates 
that information was obtained from the con- 
tent of the interview that was not obtained 
from the written documents but that seeing 
the applicant did not make S more discrimi- 
nating than listening to the applicant. 

The Ss were required to estimate their feel- 

6A Neuman-Keuls test was used for all post- 


analyses of means, and the significance level required 
was at least .0S. 





ing of certainty regarding two tasks: their 
estimates of ratings on the TEI, and their 
ranking of the applicants. On the estimates 
of consequences on the TEI, there was sig- 
nificantly more certainty expressed with 
audiovisual than with audio interview infor- 
mation. No interview information yielded a 
mean score that was not significantly different 
from the other scores. On the feelings of 
certainty about ranking of applicants, audio- 
visual interview information yielded signifi- 
cantly more certainty than either audio infor- 
mation or no interview information. For each 
estimate of certainty (i.e., ratings on the TEI, 
and rankings of applicants), there was more 
certainty expressed when audiovisual informa- 
tion was received than when only audio in- 
formation was received. Even though this 
increased certainty was. expressed, the deci- 
sions were no more discriminating, no more 
consistent, and took no less time than when 
only the audio information was received. This 
suggests that seeing an applicant in an inter- 
view does not affect the decision itself as 
much as it does the confidence of the decision 
maker. 

The results of the study do not answer 
questions about whether the nature of inter- 
view information affects the types of cues 
attended to (as suggested by Giedt; Levine 
& McGuire; Sydiaha; and the Springbett 
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studies), but they do extend the prior findings 
by suggesting that 

1. Consistency of ranking of applicants or 
of estimating consequences is not significantly 
affected by interview information. It was 
anticipated that the information included in 
the interview (both the content and the ex- 
pressive cues) would increase consistency, but 
such was not the case. It may have been that 
the compatibility of the information in the 
interview with the information in the docu- 
ments contributed to these results. 

2. Interview information did increase 
the discrimination of Ss, as anticipated, 
but seeing applicants was no more helpful 
than hearing them. These results are different 
from the results of prior studies which sug- 
gest that expressive cues adversely affect de- 
cisions; they appear to be more compatible 
with Giedt’s results where content of the inter- 
view was the predominant influence on diag- 
nostic decisions. However, it should be remem- 
bered that prior studies were not concerned 
with the measurement of discrimination. 

3. Seeing applicants increased Ss’ feelings 
of certainty about applicants. This result, too, 
is compatible with Giedt’s findings, but 
extends them by indicating that while the 
judgments made in the selection process may 
not be affected by visual cues, the feeling of 
certainty regarding these judgments may be 
increased. Certainty under these conditions 
may be a function of a lack of experience 
with audio stimuli only and may be amen- 
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able to training. If so, there would be no 
apparent disadvantage to audio interviews, 
for example, via telephone. 


4. As anticipated, total time taken for the 
decision-making task was greater when inter- 
view information was received. It was thought 
that the uncertainty accompanying no inter- 
view information might cause sufficient vacil- 
lation to increase the time; but if this in- 
crease occurred, it was insufficient to offset 
the time spent with the interview information. 
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The use of nonhuman organisms to perform human tasks is discussed. It is 
pointed out that application of animal behavior to human activity is far from 
new. Recent “crackpot” ideas are reviewed and suggestions for judicious 


extension of the principle are made. 


In spite of the technological explosion in the 
West, a good deal of human activity still in- 
volves arduous, monotonous, and even degrad- 
ing physical labor and, sometimes, consider- 
able danger. The most obvious examples of 
such activity include jobs such as painting and 
maintenance of tall buildings and bridges, fruit 
picking, bomb disposal, garbage collection, 
mass production assembly, etc. It should be 
made clear that, in many cases, man is engag- 
ing in activities to which he, as an organism 
on the phylogenetic scale, is not ideally suited 
and that could be performed more efficiently 
and profitably by alternative organisms. 

This is by no means a new idea. Hernstein 
(1965) pointed out that “Whenever human 
beings are paid a wage for the use of their 
sense organs rather than for intelligence or 
judgment, it is likely that they could be re- 
placed by animals, economically and easily 
[p. 103].” The problem, as Cumming (1966) 
noted is that we sometimes have difficulty 
“telling when the unique capacities of the 
human organisms are largely wasted in trivial 
performances that lower organisms are per- 
fectly capable of mastering and better able 
than we to tolerate [pp. 246-247].” 

The purpose of the present article is to 
demonstrate that substitution of nonhuman 
organisms in the performance of typically hu- 
man tasks has already shown its feasibility 
and usefulness and to suggest in addition that 
systematic expansion of such substitution is 
desirable. 


Existing Data on the Use of Alternative 
Organisms 


It hardly seems necessary to document the 
role of nonhuman organisms in the life of man. 
1 Requests for reprints should be sent to Douglas 
A. Bernstein, University of Illinois, Psychological 


Clinic, Children’s Research Center Building, Cham- 
paign, Illinois 61820. 


Animals have been employed for thousands of 
years in a wide variety of ways, most of which 
involve simple exploitation of their already 
available response repertoires. Dogs herd cat- 
tle and sheep, hunt and track, guard property, 
and lead the blind. Pigeons carry messages; 
horses and other large animals aid in agricul- 
tural activities and provide transportation, 
while nearly every kind of nonhuman from 
flea to elephant acts as entertainers (e.g., Bre- 
land & Breland, 1951, 1961). The list can be 
lengthened almost endlessly, though this need 
not be done to illustrate the point that men 
and animals have led, and continue to lead, 
interacting lives. 

With the advent of World War II, the 
range of tasks for which animals are used 
began to expand, though again basic response 
repertoires were exploited, not modified. As 
one might expect, most of the new tasks were, 
at first, military. For example, Skinner (1956) 
noted that dogs were trained by the Russians 
to disable tanks by running close alongside 
with magnetic mines, and that the British 
trained sea gulls as submarine detectors (Skin- 
ner, 1960). In the latter case, British sub- 
marines were sent out along the coast to re- 
lease food while running submerged. The birds 
soon learned to follow submerged vessels and 
were never taught a German-British dis- 
crimination. 

Skinner (1960) also mentioned (though did 
not vouch for) a report of the Russians’ use 
of sea lions to sever cables attached to floating 
mines. The procedure apparently involved fit- 
ting the animals with an electrical cutting de- 
vice that, when correctly brought into the 
proximity of a cable, both closed the blades 
and dispensed a few fish from a small tank. 
When battery power became insufficient to 
drive the blades, a discriminative stimulus sent 
the animals back to base for resupply. 
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Perhaps the most famous wartime use of 
alternative organisms was in the pigeon guid- 
ance system, designed and perfected by Skin- 
ner, which allowed a winged aircraft, loaded 
with explosives, to be brought in on a desig- 
nated target by a team of on-board pigeons 
that pecked at a visual target display. The 
details of the arrangement are available else- 
where (Skinner, 1960). Even though the sys- 
tem worked with great accuracy and was dem- 
onstrated to be feasible, it was rejected by a 
group of scientists because “the spectacle of 
a living pigeon carrying out its assignment, 
no matter how beautifully, simply reminded 
the committee of how utterly fantastic our 
proposal was [Skinner, 1960, p. 34].” 

Another rejected wartime plan suggested 
dropping thousands of “incendiary bats” on 
enemy cities. Each bat would have carried an 
incendiary time bomb, all of which would have 
exploded simultaneously after the bats had 
settled under eaves, in attics, and elsewhere 
(Skinner, 1960). In spite of the rather bleak 
reception given such imaginative proposals in 
the past, the expanded use of animals in 
military settings is still a very real, though 
not necessarily desirable, possibility. It already 
has been demonstrated, for example, that 
pigeons might be used to analyze photo recon- 
naisance material (Herrnstein & Loveland, 
1966), and there is apparently some classified 
research underway relating to the military 
uses of pigeons (Herrnstein, 1965). 

Nonmilitary uses of alternative organisms 
in which no significant repertoire modifications 
were made have been reported also. Probably 
the most significant of these has been in the 
area of quality control in industrial settings. 
Verhave (1966), working at a large pharma- 
ceutical company, trained pigeons to inspect 
gelatin capsules for defects. The birds did so 
by pecking one of two keys. In no case did 
birds approach or make contact with the 
capsules, and their performance after a week 
of training had reached 99% correct. How- 
ever, the system was rejected by the company, 
apparently because it feared adverse conse- 
quences for public relations. 

This project, while it lasted, stimulated 
Cumming (1966) to develop a similar pro- 
cedure at another industrial firm where, in 
this case, the pigeons were trained to inspect 
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diodes for paint defects. The results gave every 
indication that the birds were capable of 
levels of speed, accuracy, and endurance that 
surpassed those of human inspectors, but 
again, the idea of incorporating the system 
into the production line of the company was 
rejected, this time due to questions about the 
response of animal lovers and, perhaps more 
realistically, organized labor. 

So far we have been talking about using 
alternative organisms whose behavioral reper- 
toires are left essentially intact. There are, 
however, many examples in which significant 
repertoire modifications have been accom- 
plished, that is, cases in which animals are 
trained not simply to, say, peck at a key 
rather than at grain, but to emit new responses 
not seen in the naturally occurring behavior of 
the organism. Such modifications are most evi- 
dent in cases where animals are trained as en- 
tertainers. Breland and Breland (1951, 1961) 
report “acts” involving pigs that operate 
vacuum cleaners, chickens that bat and 
retrieve balls, raccoons that deposit coins 
in a bank, etc., and it is easy for all of us 
to recall having seen countless other animal 
entertainers who behave in similarly “non- 
animal” ways. 

But perhaps the most exciting examples in 
which behavior repertoires are modified in- 
volve the primates, for here we have an alter- 
native organism whose physical structure (and 
thus, potentially, whose behavior) most closely 
approximates that of man himself. ‘““Monkey 
acts” have long been among the biggest at- 
tractions at zoos, circuses, and in Tarzan 
movies, partly because these animals can do 
so many ‘“‘near-human” things. It has not been 
until recently, however, that the potential use- 
fulness of primates as alternative organisms 
for nonentertainment purposes has begun to 
gain recognition. 

Not all of the applications have been legal. 
Clarke (1958) reported a case on the New 
York City police blotter in which a chim- 
panzee had been trained to burglarize upper- 
floor apartments by climbing the side of the 
building and entering through a window. 

The most spectacular case, which is really 
no more than a monkey act that got out of 
hand, was reported on a short-wave radio 
broadcast from Johannesburg, South Africa. 
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Apparently, South Africans have trained ba- 
boons to drive tractors and are employing 
them in the planting, cultivating, and harvest- 
ing of crops in that country. (An inquiry, 
requesting further details and references, was 
sent to Radio South Africa, but was ignored). 

The preceding series of what are, to us, im- 
pressive examples of the use of alternative 
organisms has been presented in order to make 
one major point: A good deal of work has now 
been done, both in earnest and in jest, which, 
when viewed in light of an ever evolving ex- 
ternal control technology begun by Skinner 
(as well as of an internal control technology, 
e.g., Delgado, 1963, 1965), clearly demon- 
strates the potential for employing nonhumans 
in a range of activities that is limited only 
by the bounds of our ingenuity and the pres- 
ence and nature of species-specific charac- 
teristics. 


Suggested Applications and Extensions 


Given the existing data and the presence of 
the technology, it does not seem unreasonable 
to make a series of suggestions for ways in 
which alternative organisms might be em- 
ployed in the future. 

The most obvious areas of attention are 
those in which some evidence of success is al- 
ready available. Thus, it is suggested that ef- 
forts be made to expand the use of alternative 
organisms of various kinds in agricultural set- 
tings (e.g., primates as fruit pickers, tractor 
operators, general maintenance workers, etc.) 
as well as in quality control (pigeons are the 
obvious choice here) and certain types of as- 
sembly work (primates would probably do 
very well at this). 

Moving into hitherto unexplored areas,’ it 
seems perfectly sensible to investigate the 
feasibility of training baboons or other pri- 
mates to act as garbage or litter collectors, 
dock workers, window washers, street sweep- 
ers, and bellboys, to name a few possibilities. 
Because of their behavioral potential (and 
their expendability), primates might also make 
excellent astronauts on certain types of mis- 
sions. 

A variety of animals might be employable 


2Some of the following suggestions (and some 
others that do not appear here) have also been made 
by Clarke (1958). 


Douctas A. BERNSTEIN AND THomas M. ALLOWAY 


as rural (or even urban) mailmen. Porpoises 
could be trained systematically to cruise as 
resident life guards off beaches and lake shores 
(it has been reported anecdotally that these 
animals have saved swimmers’ lives even with- 
out such training), and pigeons, it would seem, 
would make excellent radar screen observers, 
especially where constant surveillance is re- 
quired in areas (such as along the DEW line) 
where the climate makes the continued pres- 
ence of human personnel absurd. 

These suggestions are meant only to stim- 
ulate further thought and additional ideas.* 
They are offered in the hope that it may some 
day be possible to relieve man of “some of the 
more odious . . . tasks on which the capabil- 
ities of human beings for extremely complex 
judgments and decisions are wasted [ Ulrich, 
Stachnik, & Mabry, 1966, p. 238].” 

Readers unacquainted with the literature 
on animal training may be understandably 
skeptical about the possibility of training ani- 
mals to perform tasks such as those proposed 
here. The techniques for doing so, however, 
are actually rather straightforward. In many 
instances, the first step would be to place the 
animal on a “token reward system” in which 
correct behavior is maintained by giving the 
animal a poker chip or other easily dispensed 
token, which he may spend later to obtain 
the necessities and luxuries of life, such as 
food or his mate. The procedure for establish- 
ing the value of tokens is a simple one (Wolfe, 
1936) in which the animal is trained to insert 
the tokens into a vending machine to obtain 
food. Once the animal has learned to do this, 
the tokens themselves take on reward value 
and can be used to establish and maintain any 
behavior of which the animal is capable. Ani- 
mals, like men, can be trained to work for 
wages. 

Once the animal is established in his “token 
economy,” job training can proceed by means 
of a procedure known as “shaping,” in which 
the animal is rewarded for increasingly ac- 
curate approximations of the desired response. 
In this way, animals can be taught to do things 
that are quite unlike anything they might do 


8 With reference to his pigeon guidance system, 
Skinner (1960) wrote that “One virtue in crackpot 
ideas is that they breed rapidly and their progeny 
show extraordinary mutations.” 


Usr or ALTERNATIVE ORGANISMS 


spontaneously. Let us take the hypothetical 
case of a chimpanzee window washer. 
Training would consist of the gradual estab- 
lishment of the desired behavior by systemat- 
ically rewarding the animal first when han- 
dling sponges and rags, then only when these 
implements were being applied to a model 
window, and finally only when they were being 
used in a manner that efficiently eliminated 
dirt from the window. When window-washing 
behavior had been well established, the animal 
would be taught to climb ladders in order to 
reach windows, to position ladders, and so on. 
Ultimately, the animal might be equipped with 
a device that would dispense a token whenever 
a window had been properly washed.* 
Purely emotional objections notwithstand- 
ing, the major problem with which one is faced 
when contemplating implementation of ideas 
such as those presented here is that of what 
is “to be done” with the humans who are re- 
placed by alternative organisms. A solution to 
this problem lies not in condemning the ideas 
and procedures that create it (progress usually 
results in disruption of the status quo), but in 
building the cultural and social resources and 
machinery needed to make it possible for all 
individuals to behave at a level commensurate 
with their intellectual endowment, This need 
not and should not result in an Orwellian 
civilization. Rather, the results of the judicious 
use of alternative organisms should allow for 
maximum individual freedom and develop- 
ment, We are advocating ideas that contain 
features designed to augment, not diminish, 
4 When training primates, shaping may not even be 
necessary. The systematic use of modeling procedures 


might be entirely sufficient for the establishment of 
many skills, 
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the characteristics of man’s existence, and it 
seems to us that this cause may be advanced 
when, as Cumming (1966) so beautifully put 
it, we learn when to “send a bird to do a 
bird’s job [p. 247].” 
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The number of pieces of mail sorted in a specified time was compared for two 
groups, each composed of 10 student volunteers. One group used standard 
post office department sorting techniques while the other used color-coded 
street names. Using a ¢ test, the null hypothesis of no difference was rejected 
at the .05 level of confidence. The innovative technique was found to be 


superior to the conventional one. 


Current mail-sorting techniques for mail 
carriers may not be as efficient as they might 
be. The sorting of mail is a slow and tedious 
process, especially for new carriers. They are 
faced with a sorting case with many street and 
box numbers on it, arranged according to the 
way they will be distributed along their route. 
Because of this arrangement, the case is con- 
fusing and the streets and addresses are diffi- 
cult for the novice to find. Faced with this 
array and with several stacks of mail that must 
be sorted accurately, it is no wonder that 
initial sorting is slow and that speed is gained 
only after much practice. 

Robert McFadden, a route carrier in Jack- 
sonville, Florida, suggested that color coding a 
mail-sorting case could increase significantly 
the number of pieces of mail sorted in a given 
time. Such a procedure might also decrease the 
number of letters sorted erroneously. Hence, 
fewer letters would be delivered to incorrect 
addresses, which can result in a 1- or 2-day 
delay in delivery. 


PROCEDURE 


A standard mail-sorting case, such as that depicted 
in the City Carrier’s Instruction Handbook (Bureau 
of Operations, no date), was used by both control 
and experimental groups. It was programmed for 12 
streets and used seven shelves with forty 1-in. separa- 
tions on each shelf. There were, thus, 280 slots for 


1 Requests for reprints should be sent to Dell Lebo, 
Child Guidance Clinic, 1635 St. Paul Avenue, Jackson- 
ville, Florida 32202. 


sorting mail. The two groups were each composed 
of 10 student volunteers. 

Prior to sorting, 500 envelopes were stuffed with 
dummy letters and addressed by hand by approx- 
imately 120 other students: Each student addressed 
4 envelopes from the route to be used. This technique 
provided the sorter with a variety of handwriting 
such as a carrier might experience in actual practice. 
Care was taken to include at least one letter for each 
slot of the sorting case. 

These letters were then distributed into six num- 
bered stacks of mail for sorting. In order to eliminate 
bias with respect to any one stack of mail being 
favored as to the location of its letters in the sorting 
case, both groups sorted stacks in the same order. 
Thus, there was no opportunity for the letters them- 
selves to offer a source of variation in the sorting 
process. 

The control group sorted mail using the conven- 
tional technique as described in the Handbook. Each 
student was given a total of 10 consecutive trials con- 
sisting of 10 min. per trial. Two trials per day were 
given with the stipulation that a minimum of 2 hr. 
must intervene between trials. Trials were completed 
within 2 wk. Upon completion of each trial the 
average number of pieces of mail sorted and errors 
in sorting were noted. 

The experimental group followed the procedure of 
the control group with major differences. These dif- 
ferences consisted of, first, using color-coded names 
for the 12 streets located on the sorting case. Since 
house numbers and streets were arranged by route 
in the case, colors appeared on various shelves of the 
case as well as, in some instances, running the length 
of the shelf. This change made the identification and 
location of the streets easier for someone new at the 
sorting task. 

The sorting procedure was also changed for the 
experimental group in that they were asked to spend 
10 min. memorizing or associating street names with 
their particular color before beginning sorting. The 
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Fic. 1. Average number of letters cased or sorted per trial for those using the conventional 
post office method and those using the method of color coding the streets. 


control group spent a similar 10-min. period simply 
reviewing the case as described in the Handbook. As 
the associations between colors and streets became 
stronger a new carrier should be able to sort mail 
faster because of a reduction in the time necessary 
to locate a street on the sorting case. 

As incentives several rewards were offered the stu- 
dent volunteers. The first prize for both groups was a 
3-day all-expense-paid trip to the Bahama Islands. 
A deep sea fishing trip was given to second and third 
place winners of both groups, and steak dinners were 
afforded “the losers.” 


RESULTS 


Figure 1 clearly shows that the experimental 
group, who used color-coded street names, 
sorted more mail in each and every trial than 
did the control group using the conventional 
sorting method. 


It is also apparent that the experimental 
group was learning the route at a faster rate 
than the control group. By the tenth trial the 
sorters using the color coding were sorting an 
average of 18.5 letters per minute, while those 
employing conventional sorting were sorting at 
a rate of 13.5 letters per minute. This differ- 
ence was significant according to the ¢ test at 
better than the .05 level of confidence. 

Extrapolating these results to a 1-hr. period 
would suggest that those sorters using color- 
coded street names could sort 300 more pieces 
of mail than those employing the conventional 
sorting technique. 

The experimental group also made fewer 
errors sorting mail. The difference was not 
statistically significant however. 
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The inference here is clear. New mail car- REFERENCE 
riers could learn a sorting case more rapidly Bureau or Oprrartons, Distribution and Delivery 


from the start and therefore sort more mail Division. City carrier’s instruction handbook; M-41, 
Washington, D, C.: United States Post Office De- 


in a given period of time, if the street names partment, no date. 
on the sorting case were color-coded. (Received January 27, 1969) 
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Fiedler’s contingency model suggests that task-oriented leaders are more 
effective where the leadership situation is either very favorable or very un- 
favorable and that relations-oriented leaders are more effective in situations 
of intermediate favorability. This model was tested among supervisors in 
both interacting and coacting groups in two organizations. Results in the 
hypothesized direction were attained although they were not generally 


significant. 


One of the most perplexing problems con- 
fronting managers has been to determine the 
leadership style most conducive to promoting 
effective work groups. Empirical studies di- 
rected toward finding that style which is most 
effective have yielded inconclusive and often 
contradictory results (Blake & Mouton, 1964; 
Fiedler, 1958; Lewin, Lippitt & White, 1939; 
Likert, 1961; Shaw, 1955). Although some 
theoreticians have been perplexed by the dif- 
ficulty in identifying the one best leadership 
style, many practical supervisors have viewed 
the leadership literature with amusement as 
they have long recognized that both the direc- 
tive, authoritarian, task-oriented leader and 
his counterpart, the democratic, human. rela- 
tions leader have proved effective in countless 
situations. The Contingency Theory of Lead- 
ership Effectiveness recently advanced by 
Fiedler (1964) suggests a theoretical explana- 
tion for both the confusion which now exists 
in the literature and the practical insights of 
many managers. 

This theory suggests that leadership is an 
influence process where the ease or difficulty 
of exerting influence is a function of the 
favorableness of the group task situation for 
the leader. Although it has been recognized 
that the favorableness of each group task 
situation may depend on different variables, 
the three most commonly acknowledged de- 
terminants stated in their order of importance 
are leader-member relations, task structure, 
and position power. Once these variables have 
been measured, they can be ordered into 





1 Requests for reprints should be sent to the au- 
thor, Department of Management, University of 
Florida, Gainesville, Florida. 


eight cells along a continuum to illustrate the 
relative degree of favorableness in a task situa- 
tion as shown in Figure 1. 

The most favorable situation exists when 
the leader enjoys good leader-member rela- 
tions, is supervising a structured task, and 
possesses strong position power (Cell 1). The 
favorableness of the group task situation de- 
creases as leader-member relations change 
from good to moderately poor; the most un- 
favorable situation is one where the leader- 
member relations are moderately poor, the 
task is unstructured, and position power is 
weak (Cell 8). The theory predicts that the 
task-oriented leader will be more effective in 
those situations which are either very favor- 
able (Cells 1, 2, 3) or very unfavorable (Cell 
8) and that the relations-oriented leader will 
be more effective in situations intermediate in 
favorableness (Cells 4, 5, 6,7). 

The empirical basis from which the con- 
tingency theory was induced is impressive: 
over 50 studies of 21 different types of groups. 
Recent studies (Blanchard, 1967; Fiedler, 
1966; Hunt, 1967; Shaw & Blum, 1966) have 
tended to support this theory in interacting 
groups and have suggested that it may also be 
applicable in coacting groups (Hunt, 1967). 
The purpose of this study is to provide ad- 
ditional tests of the contingency model in in- 
teracting and coacting groups in real life or- 
ganizations. 


METHOD 


An empirical test of the contingency model re- 
quires the following information: a measure of the 
supervisors’ leadership styles, the classification of 
group supervisors into cells on the basis of leader- 
member relations, task structure and position power, 
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Fic. 1. Correlations between leaders’ Least Preferred Co-worker scores and group effectiveness. 


a measure of leadership effectiveness, and the de- 
termination of the correlations between leadership 
style and managerial effectiveness in each cell to be 
tested. 

Supervisors were classified as task-oriented or re- 
lations-oriented on the basis of Fiedler’s (1966) Least 
Preferred Co-worker (LPC) score. Leader-member 
relations were classified as good or moderately poor 
as a result of the leaders’ perceptions of group at- 
mosphere which were indicated by their responses 
to 10 semantic differential statements describing the 
group atmosphere (Fiedler, 1967). These responses 
were aggregated and divided into high and low 
group atmospheres (good and moderately poor Jeader- 
member relations) by taking the top and bottom 
third of the scores. Tasks were classified as struc- 
tured or unstructured, and position power was de- 
fined as strong or weak by three judges completing 
questionnaires adapted by Hunt (1967). Leadership 
effectiveness was measured by asking Ss’ immediate 
supervisors to rate their performance on relevant 
job duties and personal characteristics considered es- 
sential to job performance. Spearman’s rank order 
correlation was employed to measure the relation- 
ship between leadership style scores and performance. 


Subjects 


The main consideration in the selection of or- 
ganizations to be included in this study was that a 
sufficient number of supervised groups could be 
found performing both structured and unstructured 
tasks. This criterion was met in an electronics firm 
and a teaching hospital. The existing level of ac- 
tivity in the electronics firm enabled the author to in- 
vestigate 28 groups performing unstructured state of 
the arts engineering tasks and 28 groups performing 


structured assembly line operations. Since the nature 
of these tasks required a high degree of interde- 
pendency, the groups were judged to be interacting. 
The hospital afforded an opportunity to study 23 
nursing groups whose tasks were judged to be un- 
structured and 25 groups performing structured tasks 
such as accounting, housekeeping, and routine main- 
tenance. Since the hospital groups performed their 
work without a high degree of interdependency, they 
were judged to be coacting groups. The researcher 
met with the supervisors of these groups, explained 
that he was attempting to predict leadership effective- 
ness, and that the results of the study would be 
confidential. Twenty-eight assembly line foremen, 26 
engineering supervisors, 23 nursing supervisors, and 
25 managers from patient-support activities agreed to 
participate in the study and completed the required 
questionnaires. 


RESULTS 


Since the contingency model was tested in 
both interacting and coacting groups, results 
are reported and discussed separately. 


Interacting Groups 


Questionnaire returns from the electronics 
firm enabled the analysis of 28 structured and 
26 unstructured task groups. Since super- 
visors of structured groups (assembly line 
foremen) were judged to have weak position 
power, a separation of these Ss into those 
having good and moderately poor leader-mem- 
ber relations allowed tests of the contingency 
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model in Cells 2 and 6. Since supervisors of 
unstructured groups (Engineering supervisors) 
were judged to have strong position power, a 
separation of these Ss into those having good 
and moderately poor leader-member relations 
enabled tests of the model in Cells 3 and 7. 
The calculation of Spearman’s rho shown in 
Table 1 indicated correlations in the predicted 
direction for Cells 2, 3, and 7, although none 
of them reached an acceptable significance 
level. Cell 6 revealed a correlation in the op- 
posite direction from that predicted by the 
model although it was not significant. 


Coacting Groups 


Questionnaire returns enabled the analysis 
of 25 structured and 23 unstructured task 
groups in the hospital. Since all supervisory 
positions were judged to have strong position 
power, the separation of managers of struc- 
tured task groups (patient-supporting activ- 
ities) into those having good and moderately 
poor leader-member relations allowed a test of 
the contingency model in Cells 1 and 5 while 
the division of unstructured task groups 
(nursing supervisors) enabled tests of the 
model in Cells 3 and 7. The calculation of 
Spearman’s rho shown in Table 2 indicated 
that all correlations were in the hypothesized 
direction although only Cell 5 reached a sig- 
nificance level of .05. 


DISCUSSION 


Before the contingency model can be ac- 
‘cepted as a valid theory of leadership effec- 
itiveness, many successful replications must be 
‘performed. Each cell in the model should be 


TABLE 1 


“SPEARMAN RANK ORDER CORRELATIONS BETWEEN 
: Supervisors’ LEAST PREFERRED CO-WORKER SCORES 
AND LEADERSHIP EFFECTIVENESS IN AN 
ELECTRONICS FIRM 





Cell tested N® Spearman’s rho 
2 9 —.097 
3 8 —.291 
6 9 —.238 
tf 8 +.619 


® Since the number of structured and unstructured groups was 
ot divisible equally into three groups, the remaining groups 
were assigned to the middle set and not included in the calcu- 
ations, 
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TABLE 2 


SPEARMAN RANK ORDER CORRELATIONS BETWEEN 
SUPERVISORS’ LEAST PREFERRED CO-WORKER SCORES 
AND LEADERSHIP EFFECTIVENESS IN 
HospitaL STUDY 


Cell tested N® Spearman’s rho 
1 7 PA 
3 8 —.214 
5 “ “.0/2" 
7 8 029 





a Since the number of structured and unstructured groups 
was not divisible equally into three groups, the remaining groups 
were assigned to the middle set and not included in the calcu- 
lations. 

*b <.05. 


treated as a separate hypothesis and all studies 
pertaining to a specific cell should be com- 
bined for purposes of ascertaining whether or 
not a correlation does exist. Only after it has 
been established that a correlation does exist 
will it prove fruitful to study the nature of 
the relationship through means of regression 
models. Thus, the studies reported in this 
paper can do no more than provide additional 
information concerning specific cells in the 
model. 


Interacting Groups 


The electronics firm investigation provided 
tests of the contingency model in Cells 2, 3, 
6, and 7 in interacting groups as shown in 
Table 1. A comparison of the split-group 
correlations obtained in this study with the 
contingency model predictions indicates that 
the correlations in Cells 2, 3, and 7 support 
the hypothesis although none of the correla- 
tions reached a significance level of .05. The 
correlation obtained from Cell 6 was in the 
direction opposite to that predicted by the 
model. 

Although none of the correlations in Cells 
2, 3, and 7 reached an acceptable level of 
significance, they do fit generally into the re- 
sults reported by Fiedler as shown in Figure 
1, and thus a partial confirmation appears 
warranted. The discrepancies between the cor- 
relations reported in this work and those of 
Fiedler may be explained by several factors. 
First, the number of groups subjected to in- 
vestigation in this study were smaller than 
those utilized by Fiedler. An acquaintance- 
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ship with statistical inference indicates that it 
is easier to reach a higher significance level 
if the sample size is greater. Thus, if more 
groups had been investigated, significant cor- 
relations might have been attained. Second, 
group effectiveness in Fiedler’s original studies 
was always defined as measured performance 
stated in terms of such things as physical out- 
put, contest outcomes, and deviations from 
an intended target. The measures of group 
effectiveness employed in the studies reported 
in this article were based on effectiveness rat- 
ings by higher echelon superiors of the super- 
visors whose groups were studied. This pro- 
cedure may introduce factors other than the 
actual performance of the group such as the 
bias of the evaluator. There was no way to 
measure this possibility. Thus, the method of 
effectiveness rating could account for some of 
the discrepancies between the correlations re- 
ported here and those discovered by Fiedler. 
Third, the definition of favorability used in 
the studies reported here was that originally 
espoused by Fiedler; that is, leader-member 
relations, task structure, and position power. 
Since the work performed was of a highly 
technical nature, it may be that the technical 
ability of the supervisor should have been a 
factor in the definition of the favorability 
dimension. The design of the study did not 
provide an opportunity to include this condi- 
tion. 

The negative correlation in Cell 6 can be 
explained much easier. An inspection of 
Fiedler’s model in Figure 1 shows that no 
actual studies had ever been performed in 
this cell; the curve had merely been extended 
from Cell 5 to Cell 7. This extrapolation re- 
sulted in a prediction that high LPC leaders 
would be more effective than low LPC leaders 
where moderately poor leader-member rela- 
tions existed, the task was structured, and 
position power was weak. 

Since this study is the first to measure such 
conditions, it may be that the extrapolation 
of the model was unwarranted. There ob- 
viously is no reason why the curve cannot 
dip below the line in Cell 6 and rise again in 
Cell 7. In fact, it may be argued that the 
existence of moderately poor leader-member 
relations raises the anxiety level of the high 
LPC leader since good relations are of primary 
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importance to him and his reactions intended 
to develop better relationships actually in- 
terfered with the performance of a task which 
was basically well structured and required 
better direction. On the other hand, a low 
LPC leader is not as alarmed by the existence 
of moderately poor leader-member relations 
and continues focusing his attention on task 
performance improvements in the structure 
which may lead to greater effectiveness. 

It must be remembered that this model sug- 
gests only that one type of leader tends to be 
more effective than another type in a given 
cell on the favorability continuum. This im- 
plies that special conditions may enable the 
latter type of supervisor also to be effective. 
Thus, the model only suggests that, ceteris 
paribus, one type is more likely to succeed 
than another. 


Coacting Groups 


The study conducted in the hospital pro- 
vided tests of the contingency model in Cells 
1, 3, 5, and 7, in coacting groups. The results 
from the hospital study indicate that all cor- 
relations are in the predicted direction al- 
though a significance level of .05 was attained 
only in Cell 5 as shown in Table 2. This sug- 
gests that when leader-member relations are 
moderately poor, the task is unstructured, 
and position power is strong that high LPC 
leaders are more effective than low LPC 
leaders. This conclusion is consistent with the 
literature which generally holds that a super- 
visor dealing with professional people who 
perform unstructured tasks should adopt a 
democratic rather than an authoritarian lead- 
ership style. 

Although correlations’ in cells other than 
Cell 5 did not reach an acceptable level of 
significance, it may be that the reasons ad- 


vanced previously can account for the lack of — 


significance. Certainly, the model has intuitive 
appeal and warrants further replication. 
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RELEVANCE OF RATER-RATEE ACQUAINTANCE IN 
THE VALIDITY AND RELIABILITY OF RATINGS* 


NORMAN E. FREEBERG ? 


Educational Testing Service 


Unacquainted Ss worked in three-man groups under relevant (mathematical 
tasks) and irrelevant (socializing) acquaintance conditions. The Ss rated one 
another on scales that defined several cognitive skills. They were also rated 
on these same scales by observers who were dependent on visual information, 
exclusively, and were unacquainted with the group members or the specific 
nature of the tasks being performed. As hypothesized, group members under 
the relevant acquaintance condition achieved consistently good validity 
for all three cognitive areas—with the best validity for ratings of math 
ability. Validity under the irrelevant acquaintance condition was nil on all 
scales. Observers, surprisingly, achieved significant validity (although at lower 
levels than participating group members) for ratings under the relevant 
acquaintance condition. Levels of inter- and intrarater reliability were not 


associated with levels of validity under the various rating conditions. 


Efforts devoted to improvement of the rat- 
ing process have focused primarily upon con- 
tent, construction, and format of the rating 
instrument. Only limited attention has been 
paid to the nature of the rater-ratee relation- 
ship—in terms of the conditions under which 
raters make their observations or the avail- 
ability of behavioral cues relevant to the 
characteristics being judged. 

The importance of this aspect of rating has 
not gone unnoticed in general discussions by 
such writers as Thorndike and Hagen (1961) 
who conclude that ‘the ideal rater is the 
person who has had a great deal of oppor- 
tunity to observe the person being rated in 
situations in which he would be likely to show 
the qualities on which ratings are desired 
[Ch. 13].” Burtt (1942) has similarly 
stressed ‘“‘the conditions under which the ratee 
has been observed” as affecting the accuracy 
of rating and suggests that, in addition to 
requiring information regarding how long or 
how well the rater knows the ratee, one might 
also request information regarding the cir- 
cumstances under which he was known. In 
his mathematico-deductive Theory of Rating, 
Wherry (1952) derives a theorem which 
specifies that “raters will vary in the accuracy 

1 The research reported in this paper is based, in 
part, on data utilized for a doctoral dissertation sub- 
mitted to the Ohio State University. 

2 Requests for reprints should be sent to the 


author, Developmental Research Division, Educa- 
tional Testing Service, Princeton, New Jersey 08540. 


of ratings given in direct proportion to the 
relevancy of their previous contact with the 
ratee [p. 10].” 

Empirical verification of these logical con- 
tentions is relatively scarce. Several studies 
have dealt with a concept of rater-ratee ac- 
quaintance based largely upon the length of 
time that the ratee was known by the rater 
(Ferguson, 1949; Knight, 1923; United States 
Department of the Army, 1952). Findings 
from these studies indicate that with an in- 
creasing degree of acquaintance there tends 
to be an increased “leniency” (i.e., skewness 
of trait distributions toward the favorable end 
of the scale) and higher intercorrelations 
among traits which is assumed to be the 
product of greater halo effect. Madden (1961) 
discusses a similar effect for raters evaluating 
jobs with which they have varying degrees of 
familiarity. Increased rater reliability is also 
shown with increases in “opportunity to ob- 
serve” (United States Department of the 
Army, 1952) or greater degree of acquaintance 
(Ferguson, 1949; Kornhauser, 1926). How- 
ever, failure to find such significant increases 
in reliability with increasing acquaintance is 
reported by Mays (1954) and Hollander 
(1956, 1957) for peer ratings obtained from 
OCS classes. Rater reliability in the form of 
interrater consistency in one such study gen- 
erally remained high throughout a 13-wk. 
period as did agreement between early and 
later peer nominations (Hollander, 1957). 
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RATER-RATEE ACQUAINTANCE AND VALIDITY 


Claims for improved validity and better 
trait discrimination with increased familiarity 
are reported for supervisor ratings of sub- 
ordinates (Ferguson, 1949) and students’ rat- 
ings of teachers (Bare, 1954). But again, there 
are dissident findings that serve to hint at the 
complexity of the problem and the probable 
effects of rating conditions. An early study by 
Moore (1937) indicated no improvement in 
teachers’ ranking of their students’ intelligence 
over one to three years of acquaintance; while 
Hollander (1956) points out that, although 
peer nominations did show increased average 
validity with increasing length of acquaint- 
ance, the validity is stable over time if degree 
of friendship is partialed out of the nomina- 
tions. In any of the above studies there is 
little or no control over the specific degree 
and character of the rater-ratee relationship 
as well as serious limitations in the scope or 
objectivity of available criteria used to demon- 
strate validity. 

One solution to such inadequacies is to 
organize individuals into groups, wherein rea- 
sonable control over the extent and nature of 
rater-ratee contact can be exercised. When 
this is done, trait behaviors to be evaluated 
can also be elicited, in terms of those specific 
behaviors that define any given trait, and for 
which objective criteria are readily available. 
By means of such an approach, the present 
study attempts to deal with the relevance of 
rater-ratee contact as it affects the reliability 
and validity of rater judgments. 


METHOD 
Subject Groups 


Participants. Sixty-nine undergraduate males were 
organized into 23 groups, with each group composed 
of three members referred to as “participants.” The 
first experimental session in which each group served 
was designated as the Relevant Contact condition. 
From the same pool of 69 Ss, 21 three-man groups 
were reorganized for a second session designated as 
the Irrelevant Contact condition (wherein none of 
the three participants had previously served with 
one another under the Relevant Contact condition). 
Participants in any given group were totally un- 
acquainted with one another prior to the group ses- 
sions. All identification during the group sessions, 
for rating or any other purpose, was by reference to 
different colored laboratory coats worn by each 
participant. 

Observers. For both experimental sessions, two male 
students designated as “observers” viewed the ac~- 


519 


tivities of the three-man participant groups through a 
one-way vision screen. A different pair of observers 
viewed every session; they were completely un- 
acquainted with the participants they observed. 


Experimental Conditions 


Relevant contact. The Ss in the participant group 
entered a room containing a separate table and chair 
to be used by each man and a large drawing board 
near the center of the room to be used by the three 
men as a group. The first 10 min. were spent by each 
S working independently at one of the tables on a 
set of arithmetic and algebraic problems under in- 
struction to complete as many of the problems as 
possible. This was followed by an additional 10-min. 
session during which the three Ss were to compare 
their answers and discuss any discrepancies in solu- 
tions. They were then given a set of similar problems 
at a more difficult level and instructed to solve them, 
as a group, while working together at the drawing 
board. The Ss were to arrive at a single answer for 
each problem and to complete as many problems as 
possible during an allotted 30 min. 

The rationale for individual problem solving and 
comparison of solutions, prior to working the prob- 
lems as a group, was to expose Ss to one another’s 
arithmetic ability. This was intended to maximize the 
opportunity for each S to display “relevant” (i.e., 
arithmetic) skills and minimize dominance of group 
activity by an aggressive, but less capable, group 
member. The requirement for one agreed-upon solu- 
tion was also intended to serve this purpose. 

Included in the instructions to Ss was the fact that 
they would be in competition with other groups and 
that monetary prizes would be awarded to members 
of groups which solved the greatest number of prob- 
lems correctly. It was explained that the “mirrors” 
at each end of the room were one-way observation 
screens and that they would be observed during the 
course of the session. 

Observers behind the one-way screens were re- 
stricted to visual cues only and were unable to hear 
any conversation in the room. From their viewing 
position, they could not observe what the participants 
were writing either at the tables or at the drawing 
board. Thus the specific nature of the activity should. 
have been relatively unknown to them and, in fact, 
these observers were intended to represent raters 
having minimal contact or relevant cues. 

Irrelevant contact. At least two weeks from the 
time of the Relevant Contact sessions, three-man 
groups, reorganized from the original pool of Ss, re- 
turned to participate in a second session of group ac- 
tivity designated as the Irrelevant Contact condition 
(i.e., irrelevant to arithmetic performance). The task 
chosen required that the three Ss, working as a group, 
construct an “artistic” product of some sort from 
the material in Tinker Toy sets. The intent in these 
sessions was to provide a task that would elicit be- 
havioral cues largely irrelevant to the traits under 
evaluation. As during the relevant contact sessions, 
observer raters were given no information other than 
what they could derive from visual observation of 
the group’s activity in the test room. 
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Rating Scales and Criterion Measures 


At the completion of any given Relevant or Ir- 
relevant Contact session, ratings were obtained from 
each participant S$ for each of his two fellow group 
members. Each observer also rated each of the three 
members of the participant group observed. The 
same set of three scales of 10 items each was used 
for all participant and observer ratings. One was 
designated as the Mathematical Ability scale (M), 
another as the Academic Ability scale (A), and a 
third as the Intellectual Ability scale (I). The items 
for the respective scales consisted of behavioral de- 
scriptions to be rated on a 5-point continuum that 
would characterize (a) someone competent at work- 
ing mathematical problems (e.g., “He does arithmetic 
computation rapidly without unnecessary hesita- 
tion”); (6b) someone who functions as a capable 
student (“He has all of his assignments completed on 
time”); and (c) someone who is intellectually com- 
petent (“He seems to be alert and well aware of 
what is going on around him”’). 

For the purpose of validating each of the three 
cognitive ability scales under the various rating con- 
ditions, the objective criteria consisted of scores for 
each participant on the Ohio State Mathematics test, 
the Ohio State Psychological Examination (OSPE—a 
verbal test highly related to measures of intellectual 


NorMaAn E. FREEBERG 


ability), and academic grades in the form of the 
cumulative point-hour ratio. 


Data Analysis 


From each of the experimental sessions, a pair of 
rating scores was obtained for each participant 
(ratee) on each of the three scales (Math, Academic, 
and Intellectual scales). These pairs of ratings, which 
were obtained from the two observers and the ratee’s 
two fellow participants, were summed to provide a 
single score. Thus, for each ratee there was a total 
of 12 rating scores derived from the combination of 
three scales, used by two types of raters, under two 
contact conditions. Along with the three criterion 
measures, these variables were intercorrelated in a 
15X15 matrix factor that was analyzed by the 
Thurstone (single) Group Centroid Method (Thur- 
stone, 1947). Since there were pairs of rating scores 
for each scale under each rating condition, the degree 
of agreement between raters (interrater reliability) 
could be determined by intraclass correlations. For 
measures of rater self-consistency (in essence, scale 
reliability) Kuder-Richardson (21) estimates were 
obtained. Thus, these two forms of rater reliability 
could be contrasted with the degree of rater validity 
achieved for the same rating scales under the same 
rating conditions. 


TABLE 1 
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TABLE_2 
ROTATED ORTHOGONAL Factor LoaDINGs AND COMMUNALITY ESTIMATES 
Factors 
Variable 
1 2 3 4 5 | 6 | h? 
|— ae 

1. Scale M—Relevant Contact (P) 61 61 29 02 —02 —03 83 

2. Scale A—Relevant Contact (P) 49 51 45 —04 07 02 71 

3. Scale I—Relevant Contact (P) 58 oT 58 —05 —05 05 81 

4. Scale M—TIrrelevant Contact (P) 13 —14 00 86 08 —03 74 

5. Scale A—Irrelevant Contact (P) 21 LZ —06 81 —01 06 71 

6. Scale I—Irrelevant Contact (P) 20 00 05 82 —07 —04 i 

7. Scale M—Relevant Contact (O) 36 53 03 04 56 —02 72 

8. Scale A—Relevant Contact (O) a7 48 13 01 65 10 75 

9. Scale I—Relevant Contact (O) 38 49 13 10 72 —06 93 

10. Scale M—TIrrelevant Contact (O) 09 10 —04 01 —01 85 74 
11. Scale A—Irrelevant Contact (O) 27 04 —13 09 06 70 56 
12. Scale I—Irrelevant Contact (O) 30 —05 14 02 —06 86 68 
13.40:9.P,E. 80 09 00 —07 —02 09 65 
14. Point hour 80 —03 04 —14 04 —01 62 
15. Math. test 50 48 —09 —02 —06 —13 45 














RESULTS AND DiscUSSION 
Rater Validity 


The intercorrelations presented in Table 1 
show a pattern of rater validity that is ob- 
viously superior for the condition under which 
observable behaviors are relevant to the char- 
acteristics being rated. The highest rater 
validity under the Relevant Contact condi- 
tion (yr = .60) occurs for participants rating 
one another on mathematical ability. In fact, 
all of the scales used by participants under 
the Relevant Contact condition possess sig- 
nificant validity for all three criteria. Logi- 
cally, this is a result of halo effect and re- 
flects the interrelation among the three cri- 
terion measures of academic, intellectual, and 
mathematical abilities. Thus, a rater who must 
depend, primarily, upon his observations of 
one area of a ratee’s abilities (i.e., mathe- 
matics skill) uses such available behaviors as 
the basis for judging other cognitive skills. 
His accuracy (validity) in making judgments 
of these other nondirectly observed abilities 
is, apparently, dependent upon the extent to 
which observed and nonobserved abilities are 
‘interrelated. 

Observers, under the Relevant Contact con- 
lition, also achieved significant validity with 
‘ach scale but at a lesser level of validity than 
he participants (7’s ranging from .23 to .38). 


Given the rather minimal information that 
they possessed, the achievement of any level 
of accuracy is surprising and of particular in- 
terest for subsequent discussion. 

It is, however, the factor structure for all 
ratings under the various rating conditions 
that provides the clearest picture of the over- 
all effects. Six factors were extracted, based on 
a criterion of reducing residual values to ap- 
proximately zero (the residual matrix of 
Table 1 indicates that the criterion was, es- 
sentially, satisfied). Factors were rotated, 
manually, to psychological meaningfulness. 
The first factor could be classed as a general 
one representing accuracy in judging General 
Cognitive Ability, since the highest loadings 
appear for the three criterion measures and 
most of the other variables achieve positive 
loadings on this factor. Participants under the 
Relevant Contact condition display the great- 
est capability to make such judgments. The 
next highest contribution to the factor appears 
for the ratings by observers under the Rele- 
vant Contact condition, while ratings under 
the Irrelevant Contract condition show the 
least contribution to this dimension. 

Factor 2 provides the clearest pattern of 
the effectiveness of a relevant acquaintance 
condition on rater validity. The factor is best 
designated as Computational (Arithmetic) 
Ability, with the Mathematics Test as the 


522 


only criterion measure loading on the factor 
and loadings of interpretable magnitude for 
rating conditions occurring only under the 
Relevant Contact condition (participants and 
observers). Ratings made on the basis of 
ratee performance irrelevant to the charac- 
teristics being rated make virtually no con- 
tribution to this factor. Essentially, then, 
raters who have had an opportunity to deal 
with ratees in some form of arithmetic skills 
are unquestionably more accurate in making 
judgments of such skills. 

The four remaining dimensions are striking 
examples of “specific halo,” that is, specific 
to each of the four rating conditions. Thus, 
Factor 3 defines halo effect for Relevant Con- 
tact participants; Factor 4 defines it for Ir- 
relevant Contact participants; Factor 5 is 
halo specific to observer ratings under the 
Relevant Contact condition; and Factor 6 
represents halo for observers under the Ir- 
relevant Contact condition. From the mag- 
nitude of the loadings it can be seen that 
variance attributable to specific halo is larg- 
est under the Irrelevant Contact conditions. 

This illustration of a strong halo effect for 
the three rating scales under each rating con- 
dition would lend further support to the above 
contention that halo served to spread validity, 
under the Relevant Contact conditions, from 
observation of arithmetic performance to rat- 
ings of related academic and intellectual per- 
formance. Halo effect may, therefore, be said 
to lend validity to nonobserved characteristics 
insofar as these are related areas of perform- 
ance not directly observed (for example, one 
may display competence in rating an indi- 
vidual’s athletic ability for numerous sports 
although having observed his performance in 
only one sport). Bingham (1939) has previ- 
ously drawn attention to this positive role of 
rater generalization in terms of wanted and 
unwanted halo rather than the customary con- 
ception of all halo as undiscriminated “blur.” 

But the question remains of how observers 
were able to achieve a significant degree of 
rating accuracy, under the Relevant Contact 
condition, despite the minimal information 
they possessed and on the basis of which it 
was assumed their rating validity should be 
nil (i.e., they could be considered as some- 
what of a control group of raters). Observer 
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visual information was almost entirely de- 
pendent upon fairly gross, group-interactive 
behavior such as conversation directed by one 
member to others, explanatory gestures, an 
individual’s position at the drawing board 
demonstrating something, writing, passively 
serving, etc. (Lip reading by observers is as- 
sumed to have provided no significant in- 
formation.) These are forms of group be- 
havior which allow for reasonably good defini- 
tion of the degree of dominance exhibited by 
each member of the participant group. As 
indicated in the Method section, specific ef- 
forts were made to enhance group-member 
awareness of which individual possessed the 
greatest arithmetic competence. The most logi- 
cal assumption, therefore, is that this degree 
of observed dominance served as the primary 
dimension along which ratings of arithmetic, 
academic, or intellectual ability were made by 
observer raters. 

If group-member dominance does tend to 
be associated with a higher level of ability 
for the tasks involved, as has been suggested 
by Shevitz (1955), then the explanation of 
these results is tenable.’ 


Rater Reliability 


Coefficients for intra- and interrater relia- 
bility are shown in Tables 3A and 3B for 
each scale, under each rating condition. 

Internal consistency of the scales—which 
can be considered as a form of intrarater re- 
liability—is fairly good over all conditions, 
with KR-21 estimates ranging from .56 for 
participants rating one another on academic 
ability under the Irrelevant Contact condi- 
tion to .79 for observer ratings of participants’ 
math ability under the Irrelevant Contact 
condition. Contrasting these reliability coef- 
ficients with validities of Table 1 indicates 
that intrarater reliability in a given rating 
situation obviously has little bearing on rater 
validity achieved. 

Agreement between raters (interrater re- 
liability) is found, overall, to be at a more 

8 For éach experimental session, E spent almost 
the entire time with the two observer raters and 
would conclude that, other than the physical char- 
acteristics of the participants, group interactive be- 
haviors were virtually the only visual cues that 


could be utilized reasonably by an observer asked to 
make such judgments of cognitive skills. 
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TABLE 3 


INTRA— AND INTERRATER RELIABILITY 
































Scale 
Condition N 
Math Academic Intelligence 
Intrarater Reliability 
(Kuder-Richardson, 21) 
Relevant Contact (P) Bil Boi 64 69 
Relevant Contact (O) 74 Bi 61 69 
Irrelevant Contact (P) .68 .56 .66 63 
Trrelevant Contact (O) 79 mdi .66 48 
Interrater Reliability 
(Intraclass Correlations) 
Relevant Contact (P) ay 08 34 69 
Relevant Contact (O) 38 ASy! 34 69 
Irrelevant Contact (P) DOE -06% alae 63 
Irrelevant Contact (O) 46 .60 46 48 





4 Correlation not significantly greater than zero. 


modest level with 7’s ranging from .06 to .60. 
Again, one could hardly make a case for a 
pattern of interrater agreement that coincides 
with rater validity, particularly since the 
highest interrater reliabilities appear for the 
three scales used by observers under the Ir- 
relevant Contact condition, where information 
available to raters was minimal, as was their 
resulting validity. In effect, the evidence tends 
to support Wherry’s (1952) contention that 
“the reliability of a rating scale tells us very 
little about its value, since the apparent re- 
liability may be due to bias rather than true 
score [p. 39].” 


CONCLUSIONS 


Where the nature of rater-ratee contact is 
reasonably controlled, a situation which pro- 
vides an opportunity for more relevant cues 
to be exhibited has been shown to result in 
more accurate ratings. Rater reliability, how- 
ever, in the form of either rater self-con- 
sistency or agreement between raters is not 
necessarily affected by the relevance of the 
observations to the qualities judged. 

Raters, as might be expected, utilize ob- 
served cues relevant to one area of ability in 
order to generalize to other nondirectly ob- 
served characteristics. The resultant effect of 


such halo can be one of enhancing validity for 
ratings of the nonobserved characteristics, in- 
sofar as these are related to the ones ob- 
served. Where ratee behaviors elicited are 
logically irrelevant to the characteristics being 
judged, the ratings have been shown to lack 
validity over several related criterion charac- 
teristics. Raters thus generalize their inac- 
curacies in rating (based upon irrelevant 
cues), as well as their judgments based upon 
relevant observations. 

In practice, then, it seems that ratings de- 
pendent upon mere acquaintance of rater and 
ratee are of doubtful value without specific 
knowledge of the nature of this acquaintance. 
If length of rater-ratee acquaintance is used 
to justify the choice of raters, this can only 
prove defensible where longer acquaintance 
provides greater opportunity to observe be- 
haviors that are relevant to the traits judged. 
Nevertheless, many critical judgments of such 
qualities as an individual’s “patriotism” or 
“reliability” as an employee, are made by 
raters whose contact is often no more relevant 
than that of a neighbor or a social acquaint- 
ance. Further, agreement between raters in 
such practical rating situations is often used 
to bolster the claim of rating accuracy. Here 
again, as has been pointed out, such agree- 
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ment between individuals (or self-consistency 
by individual raters) is not at all a necessary 
reflection of rater validity. 

In addition to such general conclusions 
drawn from these study data, it is felt that 
an important consideration is the value of 
the method used. Achieving reasonable con- 
trol over the nature and extent of rater-ratee 
contact, by use of selected individuals in a 
small group setting, allows for variation of a 
number of parameters that might be influenced 
by the conditions of acquaintance. A next re- 
search step would, logically, be a more precise 
delineation of degree of relevance as it affects 
rating accuracy over a greater variety of 
traits, from specific cognitive skills to aspects 
of social or attitudinal characteristics. 
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In two different studies it was found that the contribution of member ability 
to group productivity was dependent on both the ability of the member and 
the kind of task organization employed by the group. The first study was 
carried out in a military setting with 40 four-man groups and the second study 
involved 48 three-man groups with undergraduate college students as Ss. When 
the group task required members to cooperate by coordinating their efforts, 
group productivity was significantly affected by both the average ability of 
the group and the ability of the dullest member. When the group task re- 
quired members to cooperate by collaborating, group productivity was not 
significantly affected by either the average ability of the group or the ability 


of the dullest member. 


A number of reviews (Gibb, 1954; Heslin, 
964; Mann, 1959) have shown that the 
bilities of group members are generally re- 
ited to group productivity in a positive 
vanner. Correlations between measures of 
ask-relevant abilities and group productivity 
re typically small, however. One possible 
eason for the smallness of these correlations 
; the neglect by researchers of the organiza- 
ion used by members in performing the group 
ask. The organization most used in studies 
f small group performance is a collaborative 
ne where group members are expected to 
poperate with each other at all stages of 
ne task activity (e.g., discussion and problem- 
olving task). Under these conditions, it has 
een found by some researchers that person- 
lity factors are better predictors of group 


1The study was supported in part by Contract 
R 177-472 with the Advanced Research Projects 
gency (Fred E. Fiedler and Harry C. Triandis, 
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productivity than  task-relevant abilities 
(Schutz, 1958). Members with superior ability 
are often unable to contribute significantly 
because of personal conflicts and incompati- 
bility with other members. 

Little is known about the relationship be- 
tween abilities and group productivity in 
situations where the group is required to 
cooperate through task coordination rather 
than through collaboration. Coordination oc- 
curs when different tasks are allocated to dif- 
ferent positions and the tasks are then ordered 
by definite precedence relationships. Under 
these conditions all members not only have an 
opportunity to influence the group product, 
but are actually required to contribute. Hence, 
if members of a group are allocated separate 
tasks of equal importance, it is likely that the 
group product will be proportionate to their 
summed abilities. Furthermore, because of the 
definite task sequencing, it is probable that 
the quality of the group product would be 
particularly sensitive to poor performance by 
any one person. This form of cooperation is 
observed in assembly lines where shoddy per- 
formance by one worker often results in an 
inferior product, even though the remaining 
members are quite competent. 

In summary, for tasks where coordination 
is high, group productivity: should be posi- 
tively related to the summed abilities of all 
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group members and positively related to the 
ability of the least competent member. For 
tasks with high collaboration, group produc- 
tivity should be less strongly related to these 
ability measures. Evidence to support these 
statements was obtained from two experi- 
ments which were concerned with the rela- 
tionships between group structure and pro- 
ductivity. As measures of member ability were 
available, the effects of these abilities upon 
group output could be estimated. 


METHOD 


Coordination and Collaboration 


In order to measure the amount of coordination 
or collaboration required by a given organization 
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O’Brien, in press). These indices can be used when- 
ever it is possible to identify the positions and tasks 
in a group, the allocation relationships ordering tasks 
and positions, and the precedence relationships 
ordering tasks. 


The collaboration index, Cz, is given by the formula 


t=mj=n 
Lu L (biti) —0 
t=1j7=1 
[1 Cos n(m — 1) 
i=mj=n 
where >> > (fjit;) is the sum of the entries in the 
t=1j=1 


task allocation matrix (PT). The (i) entry in this ma- 
trix has the value 1 if Position /; has allocated to it Task 
t; and the value 0 if ¢; is not allocated to pi; m = the 
number of subtasks and m = the number of positions. 





structure, two cooperation indices were derived An index of strict coordination, Co, is given by the 
(O’Brien, 1968; Oeser & O’Brien, 1967; Witz & formula 
t=mj=m t=mMt=m t=mj= 
Se yy) — De ee) es si (ui?) 
t=17=1 t=1i¢=1 #=17=1 
2 Cr 
[2] ss M (m) M (n) 


where the entries of («:y;) and (x:y;) are obtained from 
the resultant of the following matrices 


(PL): 2) ODS PILL) RD Owe 152) 
= (CLIN) (CL NOC2ZI eCPM 


(PT)’ is the transpose of (PT) and the symbol o indi- 
cates elementwise multiplication. The precedence ma- 
trix is (TT). The (ij) entry in this matrix has the value 
1if Task ¢; must be preceded by Task 4; and the value 0 
if Task ¢; is not preceded by Task #;. The entries of 
(u;v;) are obtained from the resultant of the following 
matrices 


((PT)': (PT) 0(PT)’ (PT))0 (IT) 
— ((PT)0(PT))': ((PT) 0 (PT)) 0 (IT). 


M(m) = $m? when m is even and M(m) = % (m? — 1) 
when m is odd. 
M (n) = 4 n* when n is even and M(n) = 3 (n? — 1) 
when is odd. 

The indices were used to calculate the collaboration 
and coordination values for the task organizations used 
by groups in the following studies. 


> 


Study I—Army Study 


In this study 160 Australian regular army soldiers 
(NCOs and privates) were assigned to 40 four-man 
groups. Twenty of the groups were given the task 
of writing a recruiting letter and the remaining 
groups were required to prepare two charts showing 
the results of apprentice examinations at Army 
technical schools. Groups in each set of 20 were 
matched on status or rank structure and prior 





acquaintance. For each group, the leader was defined 
as the soldier with highest rank. 

Recruiting letter task. For this task, the group was 
asked to write a letter to Australians in the age 
group 17-20 yr. old. Group members were told that 
the letter should explain why the Army is a worth- 
while career and should encourage them to enlist 
in the Australian Regular Army. Instructions were 
given to make the letter as persuasive, fresh, and 
original as they could. Time given to discuss and 
write the letter was 45 min. A similar task has 
been used by Fiedler (1967). 

Chart task. Each group was given sheets showing 
the scores of Army apprentices in examinations held 
at various apprentice schools. They were required 
to use this information to construct two charts 
showing the results for two different years. A 
sample chart was provided and also written in- 
structions on how to calculate a percentage and 
construct a chart. Groups were asked to work as 
quickly and as accurately as they could. The time 
taken by different groups to complete the task varied, 
but average time was 40 min. The task involved the 
separate subtasks of (a) counting the number of 
apprentices who passed, (6) calculating percentages 
of passers, and (c) constructing the chart. Two people 
were required to work separately on counting the 
number who passed, one person to calculate percent- 
ages, and the fourth person to construct the chart 
itself. 

The structures of the work organizations used 
for these two group tasks are shown in Figure 1. 
In these graphs each position and subtask are rep- 
resented by points. The allocation relationships are 


represented by directed lines from positions to. 


a 
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isks, and the precedence relationships by directed 
nes from subtask to subtask. The collaboration and 
,ordination values are given in Table 1. 

Ability. Upon entry to the Army, each soldier had 
een administered an Army General Classification 
est (AGC). This test is a group test which in- 
udes a variety of item types including analogies, 
umber series, verbal reasoning, patterns, and circle 
ries. This test was produced by S. Hammond and 
Bradshaw as a general classificatory test for IQ 
inge 70-130. The test correlates .83 with the Otis 
itermediate, .76 with the Otis Higher, and .78 
ith Raven’s progressive matrices test. 

Productivity criteria. The letters produced were 
ited by six judges who were all psychologists or 
raduate students in psychology. None of the raters 
as responsible for the design of the study. Two 
f them were full-time army officers. Each rater 
‘as given a short training period to acquaint him 
ith the five dimensions on which each letter was 
) be judged. These dimensions were (a) well- 
ritten, clear versus poorly written, sloppy, awkward, 
h) understandably presented versus confused, in- 
ymprehensible, (c) interesting versus boring (d) 
ersuasive versus unconvincing, and (e) original, 
‘eative versus trite, platitudinous. 

Ratings for each letter were summed over all judges 
sing the procedure advocated by Cronbach, Gleser, 
nd Rajaratnam (1963). Interrater reliability was 
2. For the chart task, quality measures based on 
umber of errors were obtained using two judges. 
iterrater reliability was .95. Performance scores on 
ich task were converted to 50-10 modified standard 
ores, 


RESULTS 


Correlations between AGC scores and pro- 
uctivity were obtained for both sets of 
roups. These correlations are presented in 
‘able 2. The correlation between the summed 
.GC score of a group and productivity was 
ignificant at the p < .05 level for the chart 
isk, but not significant for the letter task. 

Similarly, the correlation of the dullest and 
rightest man’s AGC score with productivity 
‘as significant in the chart task groups, but 
ot in the letter task. The correlation of the 
sader’s AGC score with productivity was 
mall and insignificant for both task groups. 

It is apparent that the contributions of 
roup members’ abilities toward productivity 
epends on the group’s task. When the task 
squires a high degree of collaboration, it ap- 
ears that abilities of members are not related 
trongly to group productivity. However, 
then the task requires a high degree of 
oordination, abilities of members are related 
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TABLE 1 


VALUES OF COLLABORATION AND COORDINATION FOR 
Tasks UsrED IN THE ARMY AND LABORATORY 











STUDIES 
Task Collaboration | Coordination 
Army study 
Letter 1.00 0.00 
Chart 0.00 40 
Laboratory study 
Story: coordination 
structure 0.00 I) 
Story: collaboration 
structure 1.00 0.00 
Story: coordination- 
collaboration 
structure Oo 1D 








strongly to group productivity. Although the 
results obtained are consistent with predic- 
tions made concerning the effect of task struc- 
ture on ability-productivity correlations, it is 
possible that the results could be interpreted 
in terms of different abilities required by the 
two tasks. Perhaps the abilities measured by 
the AGC score were relevant to the chart task 
only. Hence, an appropriate way to support 
the structural interpretation of the results 
would be to give a number of groups the same 
task or goal but vary the work organizations 
required to complete the task. 


Study II—Laboratory Study 


This study was designed to study the effects 
of organizational structure, leadership style, 
and member compatibility on small group 
creativity (Ilgen & O’Brien, 1968; O’Brien 
& Ilgen, 1968). Three kinds of interacting 
organizations were employed which differed 
in the amount of cooperation required. The 
goal was to construct three stories from three 
TAT pictures. Sixteen three-man groups were 
formed for each organization. These groups 
were matched in leadership style (as measured 
by Fiedler’s LPC scale) of the appointed 
leader and the personal compatibility of group 
members (as measured by Schutz’s FIRO-B 
scale) (Fiedler, 1967; Schutz, 1958). Amer- 
ican College of Testing (ACT) scores on 
English were available for each S. 

Work organizations. Organization 1: Co- 
ordination, but no collaboration. Each member 
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Army Study 
(a) 
P, Sy P, 
t 
fh} 
to 
Letter Task: Chart Task: 


collaboration structure 


coordination structure 


Laboratory Study 


(e) 
D Pama 3) Cmaacs 
lng eso fom eats 
Story Task: 


coordination structure 


Piet o aaars 
15 
minutes 
I (eokaloainans 


(e) 


(d) 
Piuee oe ae 


t to tz 
Story Task: 
collaboration structure 


45 
minutes 


Story Task: 
collaboration-coordination structure 


Fic. 1. Digraphs showing the organizational structure employed by groups in the army and 
laboratory studies. Directed lines show the allocation relationships between positions and tasks : 
(pit;) and the precedence relationships between tasks (tit,). 


started working on one story and after 20 
min. passed his story on to the next man and 
received a story already started by the third 
man. After another 20 min., another exchange 
was made. In this manner all members worked 
on each story, but not at the same time. 
Organization 2: Collaboration, but no co- 


. 
ordination. All members worked together on — 
each story for 60 min. Organization 3: 
Collaboration and coordination. Members — 
worked together on all stories for 15 min. and — 
then followed Organization 1. 

Digraphs showing the structure of these 
organizations are given in Figure 1. 
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TABLE 2 
CORRELATION (PEARSON’S 7) OF MEMBER ABILITY SCORE WITH GROUP PRODUCTIVITY 
Tasks 
Correlati f ductiv- 
omes"ep =, ie Bracusny Army study Laboratory study 
Letter | Chart | Coordination | Collaboration | Coordination-collaboration 
Sum of group abilities 13 Bon Bois .03 BY 
(N=20)} (N=20) (N= 16) (N=16) (N=16) 
Ability of dullest group member mlz 0m .49* —.04 SOP 
(N=20)| (V=20)} (N=16) (N=16) (V=16) 
Ability of brightest group member LD -48* po, ao 19 
(N=20)| (V=20)| (V=16) (N=16) (N= 16) 
Ability of group leader eee AL mills oS) 
(N=20)| (V=20)| (N=16) (V=16) (N=16) 
Note.—N = number of groups used in calculating correlation. 
¥*p <.05. 
Productivity criteria. The stories were rated DISCUSSION 


by five graduate students of English on plot 
originality, elaboration, plot structure, sen- 
tence structure, expressiveness, humor, and 
suspense. Interrater reliability was .82 using 
the Spearman-Brown correction. 


RESULTS 


Correlations between summed ACT English 
scores and productivity for summed, brightest, 
dullest, and leader scores are shown in Table 
2. The dullest and summed measures were the 
only significant correlations, and these oc- 
curred only in the organizations which re- 
quired coordination. Hence, the results ob- 
tained in this study are consistent with those 
in the Army study in that task-relevant 
abilities were significantly related to group 
productivity only in those task organizations 
requiring coordination and then only for the 
summed abilities and the abilities of the 
dullest member in each group. 


The significance of these results lies in the 
demonstration that the contribution of mem- 
ber intelligence to group productivity is 
dependent on both the ability of the member 
and the kind of task organization employed. 
In tasks where there is a high degree of col- 
laboration, it appears that members are unable 
to contribute significantly because the organ- 
ization involves a great deal of interaction 
and prevents the group from organizing the 
best contributions in a systematic fashion. 
Some evidence to support this interpretation 
comes from observer ratings of group inter- 
action. In the creativity study, observers 
recorded the number of comments made by 
each member and also the number of dis- 
agreements between members on the content 
of their stories. Collaborative organizations 
generated more comments and more disagree- 
ments than organizations requiring only co- 


TABLE 3 


NuMBER OF COMMENTS AND ARGUMENTS FOR THE THREE TASK ORGANIZATIONS 
IN THE LABORATORY STUDY 








Number 


Task organization 





Coordination | Collaboration | Coordination-collaboration 


Median number of comments made by group mem- 
bers during task performance 
Mean number of arguments per 5 min. session 


224 
2.56 


72 547 
1:25 3.94 


nF 
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ordination (Table 3). For the groups working 
with an entirely collaborative structure, high 
level of interaction was associated with sig- 
nificantly lower productivity (O’Brien & 
Ilgen, 1968). Organizations involving some 
degree of coordination had higher productivity 
and less interaction than collaborative organ- 
izations. 

In a task where there is low collaboration 
but high coordination, each member must 
make some contribution to the formation of 
the group product. Under these conditions, it 
is not possible for a single person to make the 
only major contribution, but it is possible for 
the group to organize systematically the con- 
tributions of the group members. For a task 
of this kind, the principle, “A chain is only 
as strong as its weakest link,” seems ap- 
propriate. Poor work by a relatively dull 
person may severely limit the performance of 
brighter members. Only when all members 
have high ability for their particular task is 
it possible for group performance to reach 
a maximal level. 

The correlations between leader ability and 
group productivity are low and insignificant in 
both studies, although they tend to be higher 
in the laboratory study. These results suggest 
that leaders in such groups do not have a sub- 
stantial direct effect on group performance. 
The magnitude of their contribution is prob- 
ably dependent on both the structure of the 
task and the status structure within the 
group. The highest correlations occurred in 
laboratory where coordination was used and 
where there were only two rank levels (leader 
and member). The majority of army groups 
had more than two rank levels. Further 
research is needed to investigate systematically 
the interrelationship between leader abilities, 
status structure, and group task organization. 

The results of these studies may be specific 
only to tasks where the group is required to 
combine various sources of information into 
one final product. Further research should be 
devoted to identifying organizational effects 
when the task requires groups to generate a 
large number of products from a limited num- 
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ber of resources (e.g., producing alternative 
solutions to a human relations problem). It 
may be that different task types require 
groups to have different organizations for 
optimal effectiveness. These results suggest 
also that the assignment of individuals to 
groups should be made after consideration of 
both their abilities, the ability of other group 
members, and the type of task organizations. 
It seems to be inefficient to assign members of 
high ability to groups where the task alloca- 
tion relationships are such that their con- 
tributions are going to be limited by the poor 
performance of relatively incompetent mem- 
bers. 
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An information questionnaire and Likert-type scales measuring attitudes 
toward the influence of aesthetic factors in realty appraisal were mailed to 100 
industrial and 512 nonindustrial real estate appraisers throughout the United 
States. The purpose of the study was to determine if appraisers had a general- 
ized attitude toward allowing aesthetic factors to influence their appraisal 
activities and, if so, to what extent they felt that it did in fact influence them. 
Results indicated that there were two relatively unrelated but approximately 
equally important attitudes toward the issue in question: One attitude re- 
flected the concern of the appraiser for the intended users of the property 
(public vs. private concern or individuals) or the use for which the property 
was intended (recreational vs. business, etc.); the second attitude regarded 
concern for such programs as urban renewal, highway beautification, city plan- 
ning, and modern architectural trends. These results also suggested that 
appraisers are relatively positive in their attitudes, and feel that more weight 
should be given aesthetics in appraisal than is being given or than they 
personally give. These attitudes were also found to be related to the age 


of the appraiser and the size of the city from which he operated. 


Investments of time, talent, energy, and 
money by contemporary programs of urban 
renewal, highway beautification, city plan- 
ning, etc., make extremely salient the rela- 
tionship between the aesthetic aspects of land 
and land-based structures and the economic 
value of such property. Attitudes and opinions 
of the appraisers of such property constitute a 
reflection of the values of the larger society 
regarding this relationship. Such persons may 
also be considered to possess “informed opin- 
ions” regarding this matter, and, to some ex- 
tent, constitute “opinion makers” for the 
larger society. Thus, their attitudes toward 
aesthetic factors as influences upon the ap- 
praised value of a property seem especially 
relevant to the general topic of the relation- 
ship between aesthetic and economic value. 
The focus of the present study was the as- 
sessment of the attitudes of realty appraisers 


1 This research was part of a larger project study- 
ing aesthetics and economics as they relate to land 
and land-based structures, and was supported by the 
State Highway Department of Georgia and the 
Bureau of Public Roads. 

2 Requests for reprints should be sent to Jack M. 
Wright, 1340 Claremont Drive, Boulder, Colorado 
30302. This study was conducted while the author 
was at Georgia State College, Atlanta, Georgia. 


531 


regarding the importance of aesthetic factors 
in appraisal activity. More specifically, the 
question being studied was the degree to 
which the aesthetic qualities of such struc- 
tures and their surroundings should or could 
be allowed to influence their economic value, 
from the point of view of the real estate ap- 
praiser. 

Aesthetics was conceptually defined in this 
study as the quality of sensory reaction to a 
physical or nonphysical phenomenon, giving 
the phenomenon some probability of evoking 
evaluative reactions in an observer. Such 
evaluative reactions may be considered emo- 
tional in nature and are based upon or reflect 
the conceptual system of the individual ob- 
server. These reactions are not strict de- 
terminants of overt behavior (such as ap- 
praisal or purchasing) but, in interaction with 
other factors, they guide and direct overt be- 
havior. Such evaluative reactions are consid- 
ered learned and therefore dependent upon 
the history and characteristics of the observer. 
Observer characteristics included in this study 
were age and sex. Principal factors in the 
background of the observer dealt with in this 
study were physical environment (e.g., geo- 
graphic location), education, and socio-eco- 
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nomic level, and involvement in such activities 
as urban renewal. 

Factors other than the personal and social 
characteristics of the observer which are of 
relevance include such aspects of the property 
as its intended purpose and the type of use 
expected for the property. Thus, relevant 
features of the properties which were assessed 
were (a) whether it will be used for commer- 
cial, residential, or industrial purposes, (0) 
developed by public or private concern, or 
(c) used by individuals (e.g., family) or 
groups (e.g., corporations, city government). 

Our expectations were that appraisers of 
industrial properties would have more neutral 
or negative attitudes toward taking aesthetic 
factors into consideration in appraisal activity 
and that appraisers of residential properties 
would be most positive in their attitudes. 
Other expectations included the following: Ap- 
praisers of higher socio-economic status and 
educational level and those residing in more 
cosmopolitan surroundings would be more pos- 
itive in their attitudes than persons lower on 
these dimensions. 
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Sampling. The population to be sampled included 
2700 appraisers—all members of the American In- 
stitute of Real Estate Appraisers. These persons fell 
into two major groups: 100 industrial appraisers and 
2600 nonindustrial appraisers. Questionnaires were 
mailed to all of the industrial appraisers and to 20% 
of the nonindustrial appraisers. The sample of non- 
industrial appraisers was selected through stratified 
random sampling proportional to the number of 
appraisers living in each state in the United States 
and in the District of Columbia. The states of Hawaii 
and Alaska were left out of the sample for reasons 
of time, 

Of the 512 questionnaires mailed to the nonin- 
dustrial appraisers, 326 were returned and 282 of 
these were usable. Of the questionnaires mailed to the 
industrial appraisers, 42 of the returns were usable. 
Thus, the final sample was composed of 12% of the 
nonindustrial appraisers and 42% of the industrial 
appraisers. 

Sample characteristics, All except one of the re- 
spondents were male. Their age ranged from 33 to 81, 
with a mean age of 52 yr. and a standard deviation 
of 10 yr. Of 321 respondents, 140 had more than 16 
yr. education, 181 had 14 to 16 yr., 56 had 12 to 14 
yr., and only 2 had no college at all. Regarding in- 
come, 169 reported income from appraisal work of 
$20,000 or less per year, 63 reported $30,000 or {ess, 
and 84 reported above $30,000. Of 322 persons re- 
porting, appraisal was the full-time occupation of 


Jack M. Wricut AND J. H. LEMty 


139, it constituted 75% of the occupation of 51 per- 
sons, 50% for 58 respondents, and 25% or less for 
74 respondents. The median for the size of the city 
from which these persons operated was 100,000 to 
499,000 in population. Most of the sample, in their 
appraisal work, ranged across from one to three 
states; 31 worked only in one city; 147 worked in 
one state; 82 worked in two to three states; 43 
worked in more than three states but not nationwide; 
21 reported making appraisals throughout the nation. 

Regarding involvement in activities which might 
be expected to influence their attitudes toward 
aesthetics, 11 persons were involved in condemna- 
tion proceedings, 2 were involved in urban renewal 
activities, 14 in highway or other government land 
acquisitions, and 108 in mortgage loan appraisal. Of 
the persons reporting, 147 were involved in two or 
more of these activities. Only 5 persons reported no 
involvement in any such activities. 

Methods. Two instruments were mailed to Ss, 
with a letter of explanation. The first of these instru- 
ments was a general information questionnaire con- 
taining 12 questions designed to elicit information 
from them regarding personal and social character- 
istics. The second instrument contained 30 Likert- 
type statements designed to measure attitude toward 
the influence of aesthetic factors on reality appraisal. 
For the purpose of this questionnaire, aesthetics was 
defined for the respondent as being “beauty or the 
appreciation of the beautiful,’ after Cohen (1941). 
Also contained in this second instrument were 3 
questions regarding actual behavior (as opposed to 
opinion or belief) of the respondent in his appraisal 
activity. ' 

The response mode for the general information 
questionnaire varied from checking one of several 
alternatives to writing an answer. For the attitude 
statements, respondents encircled one of five al- 
ternatives to reflect the degree of agreement with the 
statement that they felt: Strongly agree, Agree, Un- 
decided, Disagree, and Strongly disagree. These al- 
ternatives were weighted as 5 (Strongly agree) 
through 1 (Strongly disagree) for positive items; 
weights for negative items were reversed. The re- 
spondent’s score was the sum of the weighted alterna- 
tives endorsed by him for these 30 items. High scores 
indicate a positive attitude toward allowing aesthetic 
factors to influence real estate appraisal. 

Two of the last three items asked the respondent to 
encircle a number (with alternatives 0-10) to indi- 
cate how much weight he gave aesthetic factors in 
his last appraisal activity, and how much he gen- 
erally gave such factors; 0 indicated very little or 
no weight and 10 indicated that aesthetic factors were 
the sole determinant of appraised value. The last 
question requested that the respondent indicate the 
nature of the last property appraised—residential, — 
commercial, or industrial. 


RESULTS | 
The 22 attitudinal items correlating with — 
the sum score of the 30 items at a high level | 


of statistical significance (p < .01) were se- 
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Opinion 





1A* Aesthetic qualities are more important in the appraisal of residential than of industrial property. 

2 Aesthetic qualities of a property have more weight in appraisal when the potential occupant or 
tenant is a private person than when the user is a group of persons, (corporation, city govern- 
ment, ete,). 

3B* T feel that the current prevalence of urban renewal programs indicates that greater emphasis must 
be placed on aesthetic factors in realty appraisal, 

4B* The federal highway beautification program is an indication of the greater demand by the public 
for beauty in land and land-based structures. 

5B* Modern city planning is a testimony to the growing public desire to create an esthetically pleasing 
physical environment in which to live. 

6 Aesthetic factors are taken into consideration most of the time by most realty appraisers in their 
work, 

7)” Much less consideration should be given aesthetic factors in realty appraisal than is presently 
given them, 

8 The public demand for efficiency, “modern appointments,” and relative low cost is greater than 
its demand for aesthetics in single family housing. 

9* The suburban nightmare of “look alike” “cracker box” houses demonstrates a lack of concern 

for aesthetics on the part of the builders. 

10A* Aesthetics is more important in the appraisal of commercial than of industrial property. 

11A* Aesthetics factors are more important in the appraisal of residential than commercial property. 

12A* For property which is intended for recreational use, aesthetic factors must be given greater con- 
sideration in making an appraisal than for other properties. 

13. The beauty of a physical structure (e.g., a home) is more important than the condition of its site 
in affecting its appraised value, 

148* The growing tendency in modern architecture to build homes which blend into their natural sur- 
roundings indicates an increased appreciation of the need to give more consideration to aes- 
thetic factors in construction, 

15B* Most of my clients are actively concerned about the aesthetic aspects of the structures they wish 
to have appraised, 

16 =I do not give aesthetic factors much weight in appraising property because I feel that the market 
gives much more interest to utilitarian or functional factors. 

17* Structures which possess historical features may reflect increased market value. 

18 Structures which possess traditional features may reflect increased market value. 

19 A well-maintained older residential property will usually increase in value. 

20 Aesthetic factors are more important in the appraisal of structures located well outside of the pe- 
riphery of a city and its suburbs than in appraisal of structures which are more centrally located. 

21A* Corporate bodies such as city governments, etc., are less concerned with aesthetic features of 
properties they purchase than are private companies or individuals, 

22 ~~ Aesthetic factors are more important in appraisal of properties which are intended for commercial 
enterprises dealing in services than for those dealing in goods and production. 





Te 


37 


27 
44 
45 
40 
128 
Sa 
22 
34 
41 
33 
39 


rai 


37 
34 
260 
iol 
29 
260 
24 
wl 


30 


Note.—-Items marked with an asterisk (") are those in the shorter 12-ltem version of this scale. Items marked with ‘'A” belong ; 
to the first cluster, Items marked with ''B" belong to the second cluster, 


lected for data analysis; most items possessed 
correlations of .30 or more with total score. 
A split-half reliability coefficient (corrected) 
for these items was only .57, and analysis of 
_interitem correlations indicated that two fac- 
tors were contributing to total score. A second 
set of 12 items was selected on the basis of a 
correlation with total score of .30 or higher; 


_ the split-half reliability coefficient for this set 


of items was .75 (corrected). The 22 items 
originally selected are given in Table 1 along 


with their correlations with total score; items 
included in the later 12-item scale are marked 
with an asterisk. The following report is based 
on the sum scores derived from the 22-item 
version of the scale. 

For this 22-item scale, the theoretical range 
of possible scores is from 22 to 110 with an 
expected mean of 66, The actual mean ob- 
tained was 84.4 (N = 325), with a standard 
deviation of 7.1. Thus the distribution of 
scores was relatively skewed, with a greater 
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frequency of positive than of negative at- 
titudes toward giving aesthetic factors weight 
in appraisal. On the whole, the attitudes 
sampled ranged from neutral to positive. 

The attitude scores of respondents was then 
correlated with their report of the amount of 
weight given aesthetic factors in their last 
appraisal of a property. Attitude was found 
to be positively and highly significantly cor- 
related with their report of the amount of 
weight given, a correlation of .33 being ob- 
tained with an N of 325 (p< .001). How- 
ever, it is to be noted that the mean response 
on a continuum from 0 to 10 on this question 
was 3.0 with a standard deviation of only 2.2. 
The truncated distribution of scores on this 
question would be expected to reduce the size 
of the correlation obtained. Evidence of the 
validity of the scale rests on the content 
validity of the items, the correlations of items 
with total score, and the correlation of at- 
titude score with reports of the respondents of 
actual weight given aesthetics by them in 
making their most recent appraisal. 

Examination of the patterns of item inter- 
correlations and perusal of item content indi- 
cates that there are two relatively unrelated 
factors of approximately equal importance 
contributing to these scores. The first of these 
clusters of items (Cluster A) involves ques- 
tions reflecting the degree of concern of the 
appraiser for the intended users of the prop- 
erty (public institutions such as city govern- 
ments vs. private concerns such as companies 
or individuals) or for the use for which the 
property is intended (recreational vs. busi- 
ness, residential, or industrial use). The sec- 
ond cluster (Cluster B) involves questions re- 
garding concern for such programs as urban 
renewal, highway beautification, city plan- 
ning, modern architectural trends, etc. The 
pattern of correlations of these items with 
certain other key items (“Aesthetic factors 
are taken into consideration most of the time 
by most realty appraisers in their work” and 
“Much less consideration should be given 
aesthetic factors in realty appraisal than is 
presently given them’) is highly consistent 
and seems to indicate the following: Many 
appraisers feel that greater weight should be 
given aesthetic factors in realty appraisal than 
is given them at present; further, they feel 
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that present concern with such programs as 
urban renewal, highway beautification, and 
city planning reflect a growing public desire 
for aesthetics to be given more weight. The 
specific items involved in these two clusters 
are identified in Table 1. 

Regarding characteristics of the sample, age 
was found to be positively and significantly 
correlated with attitude score, such that older 
men were more positive in attitude toward 
giving aesthetic factors weight in assessing 
property (p < .05). Size of city from which 
the individual was operating was also signi- 
ficantly related to attitude scores. However, 
contrary to expectation, persons operating 
from smaller cities apparently were more posi- 
tive in their attitudes than were persons op- 
erating from larger cities (p< .01). Other 
factors such as educational level, yearly in- 
come, and range of operation (how many 
states were covered in the individual’s work) 
were found not to be related to attitude to a 
significant extent. This latter finding was sur- 
prising as it was expected that these three 
variables would reflect the general sophistica- 
tion of the individual appraiser, which would 
in turn produce differences in attitude. Cor- 
relations involving age and city size improved 
when recomputed using the later 12-item 
scale score. However, correlations involving 
educational level, yearly income, and range of 
operation remained nonsignificant when re- 
computed using the scores from the 12-item 
scale. 

The mean attitude scores of industrial and 
nonindustrial appraisers were also compared, 
but no differences were found; in fact, they 
were almost precisely the same, being 85.2 
and 84.2, respectively. Both means had a 
standard deviation of 7.1. When compared on 
response to the question, ““Most of my work 
is with: commercial, industrial, or residential 
property,” the differences were in the ex- 
pected direction but did not approach sig- 
nificance. By expected direction is meant that 
persons reporting that most of their work was 
with residential property had the highest mean 
attitude score, and those reporting that most 
of their work was with industrial property had 
the lowest mean attitude score; persons re- 
porting that most of their activity involved 
commercial property had a mean attitude 
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score which was intermediate between the 
other two. 


DISCUSSION 


The question of whether real estate ap- 
praisers have a general attitude toward 
whether aesthetic aspects of a property should 
or do influence appraisal of that property was 
the focus of the present study. Results based 
on this sample indicate that they do. Further, 
their attitudes seem quite positive in this 
regard. However, the results also indicate 
that they give aesthetic factors less weight (by 
self-report) in actual appraisal activity than 
they appear to feel should be given to such 
factors. 

There were found to be two general, unre- 
lated, but roughly equivalent factors which 
were contributing to the attitude in question. 
One factor was composed of items regarding 
concern for such programs as urban renewal, 
highway beautification, city planning, etc.; 
evidence appeared to indicate that most ap- 
praisers felt that less weight was being given 
aesthetics in realty appraisal than should be 
given and that such programs were evidence of 
a growing public desire for aesthetics to be 
given more weight. The other factor included 
questions reflecting concern on the part of the 
appraiser for the intended user or the intended 
use of the property. Speculatively, these re- 
sults taken together with the fact that ap- 
praisers appear to give aesthetic factors less 
weight than they feel should be given may 
indicate that the appraiser feels restricted or 
inhibited in actualizing his attitude by giving 
more weight to aesthetics. It may be that he 
feels that he is restricted thus by an inade- 
quately developed public concern, or he may 
feel he is restricted by sources of power or in- 
fluence that control both him and the public. 
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Although age of appraiser and size of city 
from which the appraiser operated were found 
to be related to the attitude in question, no 
stable inferences regarding the bases for these 
correlations can be generated. Size of city 
was negatively related with positivity of at- 
titude, with persons from smaller cities being 
more positive. Among other possibilities, this 
finding may suggest that land and land-based 
structures in larger cities are so economically 
valuable as to preclude greater consideration 
of aesthetic values. Or it may reflect some 
aspect of the personality of the individual who 
seeks a smaller city from which to operate. 
The positive correlation of age with positivity 
of attitude may indicate a general ‘“mellow- 
ing” and accompanying change in values, it 
may reflect the effects of longer experience in 
the profession, or it may indicate a genera- 
tional difference. 

It was expected that such factors as educa- 
tional level, yearly income, and range of op- 
eration in the person’s work might reflect some 
factor such as sophistication which would re- 
sult in the holding of a different attitude. 
However, these factors were found to be un- 
related to attitude. At least for the variable of 
educational level, the failure to generate a re- 
lationship may be the result of an overly 
crude measurement of the education variable; 
only four categories of response were pro- 
vided for this variable. 

Although in the expected direction, no sig- 
nificant differences were found between per- 
sons involved in residential, commercial, or 
industrial appraisal. 
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DISTORTION OF DRIVERS’ ESTIMATES OF AUTOMOBILE 
SPEED AS A FUNCTION OF SPEED ADAPTATION 


FRANK SCHMIDT anno JOSEPH TIFFIN 1 


Purdue University 


Ten young male drivers were required to make four estimates of 40 mph speed 
after varying amounts of exposure to an adapting speed of 70 mph. The 
influence of these varying amounts of exposure to the adapting speed on 
speed judgments was studied, and a significant (p< .01) upward distortion of 
estimations of 40 mph was found to occur as a function of exposure to the 
adapting speed. The eta between treatment conditions and speed estimates was 
.72, and r was .71. These results were discussed in relation to their implica- 
tions for accident rates and highway construction. 


After driving at a constant speed for a 
period of time, drivers often report that this 
speed seems slower than it did at the beginning 
and that speeds slower than this level seem 
to be extremely slow. This phenomenon has 
been reported to occur immediately after 
drivers exit from a high-speed freeway or 
expressway onto secondary roads with lower 
speed limits, and has been suggested as a 
factor in accidents at such locations (Mat- 
son, Smith, & Hurd, 1955, p. 24). It is 
also reported to occur at points in cross- 
country routes at which the driver is required 
to reduce his rate of speed while remaining 
on the same roadway, for example, while driv- 
ing through small towns, roadside residential 
areas, school zones, etc. Speed adaptation may 
be an important factor in determining driver 
estimation of speed in many driving situations. 

Many factors have been found to play a 
role in determining driver behavior. Goldstein 
(1961; 1967) has reviewed studies in which 
human characteristics have been related to 
driving behavior, and additional studies have 
been done since his most recent review (Bar- 
rett & Thornton, 1968; Schuman, Pelz, 
Ehrlich, & Selzer, 1967; Schwenk, 1967). 
Most research in this area is concerned with 
relationships between personality and _per- 
ceptual variables and accident rates. These 
personality variables may also affect drivers’ 
estimation of automobile speed, including the 
size of the car, its “noisiness,” and the kind 
of experience which the observer used as a 

1 Requests for reprints should be sent to Joseph 
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basis of estimate. In this study no attempt 
was made to control for personality variables, 
but such factors as size and “‘noisiness” of 
the test car were held constant, so that each 
S served as his own control within his series 
of speed estimations. 

As to the accuracy of Ss’ estimates of speed 
there is some disagreement, with the older 
studies (Forbes, 1932; Richardson, 1916) 
finding Ss’ estimates highly unreliable and 
inaccurate, and the more recent researchers 
finding them more accurate (Barch, 1958; 
Desrosiers, 1962; Olson, Wachsler, & Bauer, 
1961; Suhr, Lourer, & Allgaier, 1958; Weis- 
man, 1964). Perhaps the older findings were 
affected by the relative unfamiliarity of Ss 
with automobiles, and by the nature of the 
then-existing automobiles and roads. Another 
factor contributing to these differences could 
be the fact that in the two older studies, Ss 
were not passengers or drivers in the auto- 
mobile whose speed they were estimating, 
whereas of the more recent studies cited 
above, all except the Desrosiers (1962) study 
had Ss either driving the test car or riding in 
it as passengers. 

Suhr, Lourer, and Allgaier (1958), Weis- 
man (1964), and Barch (1958) found that 
drivers estimate speeds rather accurately in 
the 35-45 mph range. Errors in estimation in 
this speed range are described as slight, for 
example, 1-3 mph. Following these findings, 
the present Es selected 40 mph as the target 
speed to be estimated and attempted to 
demonstrate the effects of speed adaptation 
on estimates of this speed. 

As noted by Denton (1966), speed adapta- 
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tion has never been demonstrated outside the 
laboratory. Denton (1966) has developed a 
power function with a variable exponent to 
describe S’s estimates of speed in a simulated 
driving situation. This power function pre- 
dicts the speed adaptation effect that it is the 
purpose of this study to demonstrate. 

A search of the literature reveals only one 
previous study in this area. Barch (1958) 
attempted to demonstrate speed adaptation 
but reported ‘‘no evidence for speed adapta- 
tion in the speed judgments made by drivers 
while decelerating under the conditions of these 
studies [Barch, 1958].” He suggested that 
the explanation for the failure to demonstrate 
adaptation effects may have been that speed 
adaptation requires longer periods of constant 
speed, constant speed higher than the rates 
used, or both longer periods and higher speeds. 
Barch’s Ss drove 20 mi. at 50 mph with an 
average of 1.62 min. between estimates. In 
a second experiment, Barch (1958) increased 
driving time between estimates to 8 min. by 
reducing the number of estimates from 13 to 6. 
Again the adapting speed was 50 mph. How- 
ever, the data still did not reveal significant 
adaptation effects. It was the conclusion of 
the present Hs that the demonstration of 
speed adaption requires both a higher adapting 
speed and a longer period of driving at the 
adapting speed. 


MetTHop 
Subjects 


The Ss were 10 male undergraduate students from 
introductory psychology courses, ranging from 18-20 
in age and from 2-4 yr. in driving experience. Aver- 
age yearly mileage for the past 2 yr. ranged from 
8,000 to 25,000 miles per year (mpy) with the 
median at 13,000 mpy. Each had obtained his 
license to drive at the age of 16. None was a pro- 
fessional driver, though one had driven a delivery 
truck for a time. 


Apparatus 


The automobile employed was a 1962 Chevrolet 
two-door hardtop equipped with a manual trans- 
mission and a six-cylinder engine. The car registered 
59,000 mi. on the odometer. The regular speedom- 
eter was disconnected and the regular speedometer 
cable removed. A special speedometer cable, longer 
than standard, was installed and connected to a 
Stewart-Warner speedometer. This experimental 
speedometer was mounted inside a small cardboard 
box, allowing £, who rode in the front seat op- 
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posite S, to permit S to read his speed or not as the 
situation required, simply by positioning the box at 
the appropriate angle. A large red and white sign 
reading “Caution” was mounted on the rear of the 
test car for safety purposes. 

The main test area was a 50-mi. concrete four- 
lane, divided, limited access highway with a speed 
limit of 70 mph. The secondary test area was a level 
half-mile section of secondary asphalt road. 


Procedure 


Before entering the test car each S was given a 
copy of the following instructions to read: 


This is an experiment to see how well the aver- 
age driver can estimate the speed of the auto- 
mobile he is driving. You will be asked to drive 
Over a predetermined course and to make four 
estimates of your speed. You will be told at 
what speed to drive between estimates and will 
be allowed to use the speedometer to aid in 
setting the requested speeds. Estimates will be 
made in the following manner: As you drive 
along at the requested speed, E will ask you to 
adjust your speed to a certain level (target 
speed) without aid of the speedometer. If you 
must decelerate to reach the target speed, do 
not use the brakes. Merely take your foot off the 
accelerator and allow the automobile to slow 
to the desired speed. When you feel that you 
have hit the desired speed, indicate this to E 
by saying the word “Now!” You should try to 
adjust to the target speed as fast as possible 
but accuracy is more important than speed. If 
you feel that the automobile has slowed to a 
rate of speed below the target speed, use the 
accelerator to coax the car up to what you feel 
is the requested speed. 


Each S was then quizzed by E to assure that he 
understood the instructions, There was a total of 
four estimates per S. Each S was tested separately, 
with S driving during the actual speed estimations 
and £ driving the distance between the two test 
sites (5.5 mi.). 

Judgment 1. The first part of the study required 
each S to accelerate from a dead stop and to estimate _ 
40 mph by saying “Now” when he judged the test ° 
car to be traveling at that speed. During this time 
the speedometer was visible to EZ, who recorded the 
estimates to the nearest mile, but not to S. After 
Judgment 1, S and E changed places and £ drove 
to the main test area at 48-55 mph. 

Judgment 2. At the main test area, the test car 
was stopped for 4 min. on the roadside, during 
which time S was instructed as to what to do at the 
end of the 4 min. At the end of the waiting period, 
S accelerated to 70 mph, held that speed for 5 sec., 
and then dropped, on signal from E, to the speed 
S judged to be 40 mph. 

Judgments 3 and 4. After Judgment 2, S was in- 
structed to accelerate to 70 mph and to hold that 
speed until he received further instructions from £. 
£— assisted S in maintaining 70 mph by requests to 
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TABLE 1 


Mean Actruat SPEEDS GIVEN As Estimates oF 40 
MPH anv SDs or AcTUAL SPEED ESTIMATES 
WITHIN TRIALS 








Trials Mean estimate is) 
1 41.4 5.24 
2 44.5 5.94 
3 50.5 3.50 
4 53.4 5.42 





Note. NV = 10, with each S giving one estimate on each trial. 


increase or decrease speed. The tolerance range was 
68-72 mph. (Additional requests were made near 
points of estimation in an attempt to keep the speed 
within the 69-71 range.) After 20 mi. (as measured 
by the odometer) at 70 mph, S was requested to 
drop to an estimated 40 mph. Then S accelerated to 
70 mph again, maintained that speed for 20 more 
miles, and again made an estimate of 40 mph. 

Design. Judgment 1 provided a measure of speed 
estimation under no adaptational effects or adapta- 
tion at 0 mph. Judgment 2 was a measure of the 
effects of minimal (5 sec.) exposure to 70 mph on 
the estimation of 40 mph. Judgment 3 was designed 
to furnish a measure of the effects of driving 70 
mph for 20 mi. on the estimation of 40 mph, and 
Judgment 4 was a measure of the effects of driving 
at 70 mph for 40 miles. This last statement assumes, 
along with Barch (1958), that momentary decelera- 
tions from the adapting speed do not significantly 
reduce any ongoing adaptation process. 

Analysis. A single factor repeated measures ANOVA 
was carried out on the data. Eta and r were also 
computed. 


RESULTS 


Table 1 presents the mean actual speeds 
given as estimates of 40 mph on each of the 
four trials, along with standard deviations 
within each trial. 

Table 2 presents the results of the repeated 
measures ANOVA. The F ratio is highly sig- 
nificant. The Pearson r between treatment 
conditions and speed estimates was .71. Eta 
was .72. 


DISCUSSION 


As can be seen from Table 2, there was a 
very significant effect due to the speed adapta- 
tion treatment conditions. Although under- 
estimation occurred under all treatment con- 
ditions, Ss showed a strong tendency increas- 
ingly to underestimate their speeds as exposure 
to the 70 mph adapting speed increased. A 
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comparison of the magnitudes of eta (.72) 
and r (.71) shows that the relationship be- 
tween speed estimates and treatment con- 
ditions is almost perfectly linear. Since no 
attempt was made to distribute the treat- 
ment conditions at equal intervals along scale, 
this linearity is entirely fortuitous. Eta is a 
better measure of the relationship of interest 
here. 

In view of these highly significant results, 
Barch’s (1958) speculation to the effect that 
his failure to demonstrate speed adaptation 
effects was due to his use of too short periods 
of constant speed and/or too low constant 
speeds is credible. 

Questioned after the study, all Ss reported 
previous knowledge of speed adaptation and 
all stated that they made conscious adjust- 
ments to offset adaptation effects. One S$ 
explained to E after the test, “You have to 
seem to be crawling to do 40 mph after 
driving at 70 mph.” An important question is 
whether or not drivers make conscious ad- 
justments for speed adaption in everyday 
driving. It may well be that Ss in this study 
made adjustments because of a temporary 
desire, induced by the test situation, to be 
accurate in their estimates. This adjustment- 
inducing factor may not exist in day-to-day 
driving. The fact that there is a relatively 
high rate of accidents at the terminations of 
highways (Department of Scientific and In- 
dustrial Research, 1962) indicates that, if 
such adjustments are made in everyday driv- 
ing, they are not complete or accurate enough 
to overcome the effects of speed adaptation. 
Speed adaptation may be impossible to mea- 
sure precisely in its normal context. Some- 
thing like the principle of indeterminancy 
in physics may be operative here: By the 


TABLE 2 
Summary or ANOVA 








Source of 
Variance : af Hs y 
Between people 196.3 9 21.81 
Within people 1520.9.) 30 50.69 
Treatments 900,1 a 300.03 | 13.05* 
Residual 620.8 27 22.99 
Total if fd 39 





* p < .01 level. 
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very fact of measuring drivers’ behavior, E 
may be inducing changes in that behavior. 
However, one need not measure this phe- 
nomenon that finely to demonstrate its ex- 
istence, as has been shown in the present 
study. 

Speed adaption as demonstrated in this 
study is a factor that highway and traffic engi- 
neers should take into account when designing 
certain parts of our transportation network, 
for example, exit and entrance ramps on 
high-speed roadways, curves at the end of 
long stretches of straight road, the setting 
of speed limits, etc. As more is learned about 
this phenomenon, specific recommendations 
will undoubtedly be formulated. At this point, 
however, the real need is for further research 
in this area. 
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Two experiments were conducted. The first compared a combination of group 
and individual brainstorming with simply individual brainstorming. The second 
contrasted three problem-solving procedures: critical group problem solving, 
group brainstorming, and individual brainstorming. In Experiment II, all three 
procedures were divided into feedback and nonfeedback conditions. Feedback 
consisted of having Ss listen to the first third of their performance and then 
continue to work on the remainder. Performance under all conditions was cor- 
related with personality variables derived from the California Psychological 
Inventory (CPI), the Firo-B, the Myers-Briggs Type Indicator, a vocabulary 
test, and five factors derived from a factor analysis of the CPI. Experiment I 
indicated that there is no difference between a combination of group and indi- 
vidual brainstorming and simply individual brainstorming. Experiment II indi- 
cated that individual brainstorming is superior to group brainstorming which 
is superior to group critical problem solving. Feedback had no effect on per- 
formance within procedures. The CPI Sociability scale and the first factor of 
the CPI were shown to be consistently related to performance under group- 
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problem-solving conditions. 


During the last 15 years there has been a 
rapid growth of research interest in the facili- 
tation of creative or original thinking, and 
most of this interest has focused on the in- 
dividual (Barron, 1965; Golann, 1963; Mac- 
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Kinnon, 1962; Stein & Heinze, 1960; Taylor 
& Barron, 1963). Another area of potentially 
equal importance, but less rapid growth, has 
been that of creativity in groups. In spite of 
the tremendous surge of research on the small 
group (McGrath & Altman, 1966) few studies 
have focused on processes or procedures for 
the facilitation of creative or original thinking 
in problem-solving groups. In their review of 
studies contrasting the quality of group per- 
formance and individual performance, Lorge, 
Fox, Davitz, and Brenner (1958) excluded 
the consideration of group process as such. 
Kelley and Thibaut (1954), in their review 
of experiments on group problem solving and 
process, spent less than one page on the effects 
of formal group-problem-solving procedures, 
and cited no relevant studies. Recent reviews 
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by Hoffman (1965) and Maier (1967) cite a 
few relevant studies and deal with the ques- 
tion at some length. 

Most of the studies that have been con- 
ducted have simply dealt with the contrast 
between the individual and the groups. This 
is clearly an inadequate formulation of the 
problem. Whether or not group problem solv- 
ing is superior to individual problem solving, 
it is clear that group approaches to problems 
requiring creative solutions are becoming more 
and more necessary as the accumulation and 
fractionation of knowledge increases. There- 
fore, a question such as “Is individual brain- 
storming superior to group brainstorming?” 
is misplaced. Rather, one should ask such 
questions as “Under what conditions will 
which method solve which sort of problems?” 
“Which type of people work best using which 
methods?” “What is the optimum combina- 
tion of these methods?” “What is the best 
combination of group and individual work?” 
In all of these cases, however, a “nominal” 
group made up of the nonoverlapping scores 
of an equal number of individuals who work 
on the same problem for the same amount of 
time can serve as a very useful base line for 
evaluating and understanding the effects of 
a particular technique. The studies to be pre- 
sented in this monograph have attempted to 
answer some of the above questions. Four pro- 
cedures, group brainstorming, a combination 
of group and individual brainstorming, group 
critical problem solving, and individual brain- 
storming have been studied in some detail and 
individual differences have been examined 
within procedures. 


EXPERIMENTAL STUDIES OF GROUP-PROBLEM- 
SOLVING PROCEDURES 


Experimental studies of group-problem- 
solving procedures have been reviewed a num- 
ber of times (Hoffman, 1965; Salvatore, 
Willis, & MacKinnon, 1966). Here only those 
studies directly relevant to the experiments 
which follow will be discussed. These studies 
fall into three broad categories: (a) studies 
comparing group versus individual problem 
solving, (6) studies comparing different forms 
of group problem solving, (c) studies of brain- 
storming by individuals. 


The first study comparing individual and 
group brainstorming was conducted by Tay- 
lor, Berry, and Block (1958). The study was 
designed to investigate “whether group par- 
ticipation when using brainstorming facili- 
tates or inhibits creative thinking.” They 
found that although the mean score (numbers 
of ideas) for 12 groups of four Ss each was 
higher than the mean score for 48 individuals, 
the mean score for 12 nominal groups (com- 
bined output of four randomly selected Ss 
who worked as individuals) was higher than 
the mean score of the 12 real groups. The 
nominal groups also produced more unique 
and higher quality responses than the real 
groups. Analysis of covariance showed that 
this was due almost entirely to the larger 
number of ideas. Taylor et al. (1958) con- 
cluded that ‘To the extent that the results 
can be generalized, it must be concluded that 
group participation when using brainstorming 
inhibits creative thinking [p. 43].” 

Dunnette, Campbell, and Jaastad (1963) 
repeated Taylor et al.’s study using a modi- 
fied design which allowed the same individ- 
ual to participate in both the individual and 
group brainstorming sessions. The experiment 
was performed twice, using research scientists 
in one sample and advertising men in the 
other. The results showed that individual 
brainstorming is superior to group _brain- 
storming in both quality and quantity of 
ideas and that the largest number of ideas is 
produced under individual brainstorming con- 
ditions after a group session. One of the major 
difficulties with this study is that a female F 
was present during the brainstorming session 
This may have had an inhibiting effect over 
and above the fact that it was a group situa- 
tion. 

Parnes and Meadow (1963) have reported 
briefly on a study similar to the one reported 
by Taylor et al. (1958). They comparec 
nominal groups of individuals who workec 
under conventional (critical thinking calling 
for quality solutions as opposed to quantity) 
procedures with real groups using deferrec 
judgment. The real- groups using deferrec 
judgment were significantly more productive 
of good ideas than the nominal groups using 
critical problem solving. In a second com 
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parison the individuals who made up the 
nominal groups used deferred judgment. In 
this case, the results failed to replicate Taylor 
et al. since there was no significant difference 
between the two conditions. 

A study by Taylor and Block (1957) was 
designed to answer the question “Should 
group or individual work come first on prob- 
lems requiring creative thinking when equal 
time is devoted to each?” Arguing that if the 
group work came first, a common set might 
reduce the variety and number of ideas pro- 
duced in response to the problem, they di- 
vided 72 Ss into two groups and had them 
work on nine problems over a 4-day period. 
One group spent the first half of their time 
working on a problem alone and the other 
half working on the same problem in a group 
of three. The other half of the Ss followed 
the reverse procedure. The instructions em- 
phasized critical thinking (quality) and quan- 
tity; they were not brainstorming instruc- 
tions. It should be noted that all responses 
under both conditions were written and not 
tape-recorded as in the previous experiment 
(Taylor et al., 1958). The results indicated 
that it did not make any difference in num- 
ber of ideas produced whether the individual 
or group work came first. The 12 groups of 
three, whose members worked alone first, were 
combined into nominal groups and compared 
with the 12 groups of three who had worked 
as real groups for the same period. The re- 
sults showed that “Under these conditions, 
group participation inhibits rather than facili- 
tates the production of ideas [Taylor & Block, 
p. iv].” These results corroborated the earlier 
study. They do differ, however, from those 
of Dunnette et al. (1963), who showed that, 
under brainstorming conditions, individual 
brainstorming is superior after, rather than 
before, group brainstorming. 

Although Tuckman and Lorge (1962) do 
not deal with creative problem solving in the 
sense it is being used in this paper, they have 
introduced a paradigm that is applicable to 
the individual versus groups comparison under 
discussion. They argue that if a group is a 
more effective problem-solving unit than the 
individual, groups should generate a product 
better than the best ideas of its individual 


members. To test this hypothesis, they had 
70 individuals work on the Mined Road 
Problem alone, and then as five-man groups 
in a re-solving situation. As a control condi- 
tion, 70 groups of five men also worked ini- 
tially as groups. When a comparison of means 
was made, the re-solving groups performed 
better than their members had as individuals. 
However, when composite scores of the best 
ideas given under individual conditions in 
each of the groups were compared with their 
re-solving scores, the composite solutions were 
superior to the re-solving scores. A compari- 
son between the re-solving groups and control 
groups that simply worked initially as a group 
yielded no significant difference suggesting no 
practice effect. Tuckman and Lorge concluded 
that “The group does not even incorporate 
or summate the best ideas of its members 
[p. 49].” 

Another worthwhile variation on the prob- 
lem of individual versus group comparison 
has been introduced by Campbell (1968). 
He had second- and third-line managers work 
on N. R. F. Maier’s (1967) “Change of Work 
Procedure” problem under three conditions: 
(a) individual solutions, (4) individual solu- 
tions after a quasi brainstorming session, and 
(c) group consensus (four-man groups) in 
that order. No significant change in the aver- 
age individual solution score was found fol- 
lowing the quasi brainstorming session. In- 
dividual solutions combined into “nominal” 
group scores, using either average scores or 
a composite of the best elements for each of 
the four solutions, were significantly better 
than the real groups or average individual 
solutions. 

We now turn to studies comparing various 
forms of group problem solving. Weisskopf- 
Joelson and Eliseo (1961) compared brain- 
storming instructions and critical instructions 
on such simple tasks as creating brand names 
for a cigar, a deodorant, and an automobile. 
Using seven persons per group (mixed sex), 
they compared three brainstorming and three 
critical groups. The brainstorming groups 
produced more responses (quantity), but the 
critical groups produced responses with a 
higher mean quality. Cumulative frequency 
distributions of the quality scores demon- 
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strate that at the high quality end of the 
distribution the two procedures do not dif- 
fer in the number of responses produced. At 
the low quality end the brainstorming pro- 
cedure produces more responses than the 
critical procedure. 

Parloff and Handlon (1964) compared con- 
genial and uncongenial dyads under brain- 
storming and critical conditions. Working on 
the hypothesis that perhaps brainstorming 
groups did not generate more good ideas than 
critical groups, but simply reported more of 
them, they had Ss write down their ideas and 
they also tape-recorded each session. They 
found that more good solutions were written 
under brainstorming conditions than critical 
conditions and that brainstorming groups gen- 
erated (spoke and wrote) more solutions but 
not more good solutions than critical groups. 
Congeniality had no effect. They suggest 
“that the suspension of critical judgment may 
simply lower the subjects’ standards for re- 
porting ideas without substantially increasing 
their repertoire of “good” ideas [p. 25].” 

This experiment differed from most of the 
others in one important respect. The Ss were 
made to work on the problem individually 
prior to their dealing with it in the group, 
in order to exhaust their store of answers. In 
this way, it was expected that any solutions 
generated would then be new and unique to 
the group. This may be true, but if one as- 
sumes that the active sharing of each other’s 
ideas, whether they were part of the individ- 
ual’s repertoire prior to the interaction or not, 
is an essential aspect of brainstorming, then 
in a real sense this experiment fails to grasp 
an important aspect of the process itself. 
Along these same lines, it should be noted 
that the experimental units were female dy- 
ads. There is good reason to believe that dy- 
ads are a unique kind of group and general- 
izations about them to larger groups are 
tenuous (Thomas & Fink, 1963). 

Brilhart and Jochem (1964) compared 
three different group-problem-solving proce- 
dures. The three different procedures used 
were as follows: Procedure I consisted of three 
steps: (a) analysis of the problem, (0) brain- 
storming using Osborn’s instruction, and (c) 
setting up of standards and criteria to evalu- 
ate ideas. Procedure II was the same as I 


except Steps 6 and c were reversed. Proce- 
dure III consisted of the following two steps: 
(a) analysis of the problem and (0) genera- 
tion of solutions and evaluation of their rela- 
tive merits at the same time (e.g., critical 
problem solving). 

In terms of total number of ideas, Patterns 
I and II did not differ from each other, but 
they were both significantly superior to Pat- 
tern III. Using a more stringent criterion and 
eliminating ideas not rated above a level at 
which ideas were considered “good,” Pattern 
I was better than Pattern III. But this differ- 
ence was only marginally significant. The 
mode of recording responses in this experi- 
ment differed from most of the others. Dur- 
ing each procedure a “recorder” wrote the 
gist of each idea on the blackboard. This 
mode is more similar to writing than it is to 
tape-recording. 

Brilhart and Jochem concluded that pat- 
terns of problem solving which separate 
ideation and evaluation are superior to pat- 
terns which combine them. They comment 
that “evidently the emphasis usually placed 
on value and quality during a problem-solving 
discussion can dampen the expression of ideas 
of potential merit [p. 179].” This interpreta- 
tion was also supported by Ss’ responses to 
the questions “If you were to lead the dis- 
cussion of a similar problem, which of the 
three sequences would you most prefer 
to use, least prefer to use?” Signifi- 
cantly more Ss chose I and III over II. 

The following studies by Parnes and Mea- 
dow have evaluated the effects of brainstorm- 
ing on individual creativity. All studies used 
a written mode for recording responses. 
Parnes and Meadow (1959) compared in- 
dividuals under brainstorming and _ critical 
instructions. They found more good quality 
ideas under brainstorming instructions than 
under critical instructions. A second com- 
parison between Ss trained in a creative prob- 
lem-solving course emphasizing brainstorm- 
ing and nontrained Ss, where both groups 
worked under brainstorming conditions, 
showed that the trained Ss produced a sig- 
nificantly greater number of good quality 
ideas than the untrained Ss. A third finding 
was a positive correlation between total quan- 
tity and number of “good” ideas produced 
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under all conditions. This was interpreted as 
suggesting that “The efficacy of brainstorm- 
ing in producing an increment in good ideas 
is possibly the result of the increased quan- 
tity of ideas encouraged by the method [p. 
176|.” The question of the relationship be- 
tween quantity and quality of ideas has also 
been discussed by Hyman (1960, reported by 
Parloff & Handlon, 1964) and Parloff and 
Handlon (1964). Both Parloff and Handlon 
and Hyman found that the proportion of 
“good” ideas did not vary with the number 
of ideas. In fact, the correlation between 
quantity and quality reported by Parnes and 
Meadow (1959) is to a large extent a statis- 
tical artifact due to the lack of independence 
between the two variables. 

A second study by Meadow and Parnes 
(1959) was designed to evaluate training in 
the creative problem-solving course mentioned 
above. Under neutral testing conditions (Ss 
were not told to brainstorm) trained Ss were 
shown to be superior to a matched (IQ) con- 
trol group, on a variety of measures. A fur- 
ther study indicated that the effects of this 
course persisted for at least 8 mo. (Parnes & 
Meadow, 1960). 

Meadow, Parnes, and Reese (1959) com- 
pared the effects of brainstorming and critical 
instructions on individual problem solving. 
Four groups of eight Ss worked (individu- 
ally) on one problem under each condition 
in a counterbalanced design. They found 
significantly more “good” solutions (rated 
in terms of uniqueness and value) under the 
brainstorming instructions. They also found 
more good solutions under brainstorming in- 
structions when they came before critical in- 
structions, than after. 


PERSONALITY AND GROUP PERFORMANCE 


This section deals with the interaction be- 
tween personality and performance in small 
groups. This relationship has been dealt with 
in two complementary ways. The first ap- 
proach assumes that certain personality char- 
acteristics or personality syndromes facili- 
tate effective group performance, and the 
larger the number of group members who 
share these characteristics, the less likelihood 
of interpersonal conflict and the more effec- 
tive the group is likely to be, given a fixed 


level of ability (Cattell, Saunders, & Stice, 
1953; Fiedler, Meuwese, & Oonk, 1961; 
Grace, 1954; Haythorn, 1953; Schutz, 1958). 

The second approach favors heterogeneous 
rather than homogeneous groups, but the na- 
ture of the heterogeneous groups varies 
widely. Hoffman and Maier (1961) argue 
that heterogeneous groups defined by low 
personality profile correlations (nonspecific 
heterogeneity) tend to produce members with 
substantially different perspectives on the 
problem, thereby increasing the probability 
of good solutions or more solutions. Ghiselli 
and Lodahl (1958), on the other hand, found 
that a skewness measure which reflected the 
fact that one S in the group was consider- 
ably more dominant than the next most dom- 
inant member correlated higher with group 
performance than the average amount of a 
trait possessed by the group or the amount 
possessed by the highest scorer (Hoffman, 
1959; Hoffman & Clagett, 1960; Pelz, 1956). 

The mixed results found in these studies 
indicate that there is perhaps some value in 
both of the procedures. However, these re- 
sults also indicate that perhaps the prob- 
lem has been inadequately conceptualized. 
Clearly if the one approach assumes that 
certain personality syndromes facilitate ef- 
fective group performance, and that all mem- 
bers should share this syndrome, then gross 
profile correlations or simple summation 
scores across unselected traits are inadequate 
to reflect or operationalize the hypothesis. 
The first step in an adequate conceptualiza- 
tion of the problem requires knowledge of the 
personality characteristics of effective group 
problem solvers. The second step requires ex- > 
perimentation with both types of groups, 
homogeneous and heterogeneous, with re- 
spect to these particular characteristics. One 
should not forget the very strong possibility 
that personality type may interact with both 
problem type and group-problem-solving pro- 
cedure. For example, a shy, submissive, cau- 
tious, and methodical person is unlikely to be 
as effective in a brainstorming group dealing 
with a human relations problem as he would 
be in a highly structured group dealing with 
strictly objective types of problems. 

With regard to the first step mentioned 
above, Mann (1959) and Heslin (1964) have 
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reviewed a number of studies relating a vari- 
ety of personality measures to various small- 
group performance measures in an attempt to 
delineate the characteristics of effective 
group workers, but success along these lines, 
although encouraging, has been limited. The 
conclusion of both reviewers is that the best 
predictors of an individual’s performance in 
a group are ability (general and specific) 
and general adjustment. Mann (1959) re- 
ports that “In no case is the median corre- 
lation between an aspect of personality cov- 
ered here and performance higher than .25 
and most of the median correlations are 
closer to .15 [p. 266]. Heslin (1964) who 
reports somewhat higher individual relation- 
ships concludes that ‘The relations of six 
different personality categories to group per- 
formance has been reviewed. The direction of 
the relationships has usually been clearly 
indicated, but these relationships are weak 
for predictive purposes [p. 254].” The gen- 
erality of these conclusions and the low cor- 
relations reported limit the usefulness of this 
information with respect to the assignment 
of talent and points out that more precise 
predictive procedures are necessary if effec- 
tive problem-solving groups are to be pre- 
selected. 

The task of bringing the relationships re- 
ported above to a level of more practical and 
perhaps theoretical importance can be ap- 
proached in a number of ways. First, charac- 
teristics of other groups’ members and the 
group structure can be manipulated in such 
a way that only certain types of personalities 
can perform effectively. Examples of this 
approach are studies which vary the social 
characteristics of other members (Breer, 
1960), role requirements (Smelser, 1958; 
Speisman and Moos, 1962), and communi- 
cation networks (Leavitt, 1951). This ap- 
proach increases our ability to predict per- 
formance in one situation by decreasing the 
range of behavior which we attempt to pre- 
dict. Thus an increase in precision is accom- 
panied by a loss in generality and potential 
usefulness of the information at hand. A 
second approach consists of increasing the 
complexity of the predictor variables (Cat- 
tell & Stice, 1954). The use of a moderately 
complex predictor to predict a complex cri- 


terion (productivity in a group) could per- 
haps increase both the generality and use- 
fulness of the obtained measures. 

In order to conduct any of the four kinds 
of studies mentioned above (studies of homo- 
geneity, heterogeneity, manipulation of group 
structure, or complex predictors), more sys- 
tematic knowledge of the relationship be- 
tween personality variables and perform- 
ance under a variety of systematically manip- 
ulated conditions would be very useful. 

More generally, however, the motivation 
for studying personality variables in the 
following experiments reflects Cronbach’s 
(1957) view that “Ultimately we should 
design treatments, not to fit the average 
person, but to fit groups of students with 
particular aptitude patterns. Conversely we 
should seek out the aptitudes which corre- 
spond to (interact with) modifiable aspects 
of the treatment [p. 681].” 


EXPERIMENT I 


Taylor et al. (1958) and Dunnette et al. 
(1963) established that real brainstorming 
groups are inferior to “nominal” brainstorm- 
ing groups. The Dunnette et al. data sug- 
gest that an optimal order for combining 
individual and group work would be group 
problem solving followed by individual work. 
This suggestion was based on data obtained 
from different problems under each of the 
conditions. This excluded any test of the 
possibility that sets established during group 
discussion might carry over into individual 
sessions and limit the range of ideas con- 
sidered. Elsewhere, Dunnette (1964) has 
suggested that the group section of a com- 
bination of group and individual problem 
solving be restricted “almost exclusively to 
a sharing of information” and that “ideas or 
suggested solutions to the problem should 
be scrupulously avoided.” Whether this pro- 
cedure would avoid the problem of sets is 
problematic, but it does insure that all rele- 
vant information will be considered. For 
many problems this procedure reduces simply 
to individual problem solving, the group ses- 
sion being useful, as Dunnette points out, 
only as a means of keeping people informed 
about decisions affecting them. 
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The Campbell (1968) and Tuckman and 
Lorge (1962) studies suggest that the order, 
individual problem solving followed by group 
problem solving, has no particular value. 
Also, the Taylor and Block (1957) study 
which used nonbrainstorming conditions 
found no order effect when both the groups 
and individuals worked on the same prob- 
lems, and found very little evidence for set 
effects. If one assumes that the participants 
in a brainstorming group are exposed to in- 
formation that they would not have had 
access to had they not been in the group, 
and that they can make use of this informa- 
tion when they work individually, then in 
conjunction with the above findings there is 
reason to believe that a combination of 
group and individual work, in that order, un- 
der brainstorming conditions, may be supe- 
rior to simply individual work over the same 
period of time. The main purpose of this 
experiment is to test the above hypothesis. 
The author’s brainstorming procedure differs 
in one major respect from previous studies. 
Under individual conditions, Ss wrote down 
their responses to the problem, rather than 
verbalized them. 


Method 
Subjects 


The Ss were 48 male junior and senior students 
rom an upper division psychology course. Partici- 
jation in 2 hr. of experimentation was a course 
‘equirement. 


Design 


The experimental design which is similar to that 
ised by Dunnette et al. (1963) is shown in Table 1. 
As in that study conditions, order and problem 
ets were counterbalanced. II indicates two consecu- 
ive 10-min. sessions of individual work. GI in- 
licates 10 min. of group work followed by 10 min. 
f individual work. 


Procedure 


The experimental procedure was as follows: Ss 
net as a group and the rules of brainstorming were 
liscussed. The instructions were: 


This is an experimental study of brainstorming. 
You may not be familiar with this concept so I 
will give you a brief description of what it is 
and then discuss the rules with you. Essentially, 
brainstorming is a form of group interaction 
which is used to facilitate the flow of ideas. It 
is a technique widely used in a large number 














TABLE 1 
DESIGN OF ExPERIMENT I 
Order 
Group Individuals 

First Second 
A for. Seg 
B SG Soret Gi? 
@ 9, 10, 11, 12 
D 13, 14, 15, 16 
E 17, 18, 19, 20 Gin! be, 
F 2129098 "94 
G 25, 26, 27, 28 
H ZOSO sees? te) GI, 1 
I 33934, 35, 360 
i 37, 38, 39, 40 
K 41,42, 43,44 | GI,2 II, 1 
L 45, 46, 47, 48 








Note.—1 = Problem Set 1, Thumbs and Education; 2 
= Problem Set 2, People and Tourists; G = group; I = in- 
dividual. 


of U. S. corporations. It is generally used when 
new, unique, original, and creative ideas are de- 
sired. It is not used to solve everyday problems. 
The rules of brainstorming are straightforward 
and easy to comprehend. (1) Criticism is ruled 
out: Adverse judgement of ideas must be with- 
held until later. (2) Freewheeling is welcome: The 
wilder the idea the better. It is easier to tame 
down than to think up. (3) Quantity is wanted: 
The greater the number of ideas, the more likeli- 
hood of winners. (4) Combination and improve- 
ment are sought: In addition to contributing ideas 
of their own, participants should suggest how ideas 
of others can be turned into better ideas. Are 
there any questions? 


The instructions were generally followed by a 
number of questions and the rules were elaborated. 
A demonstration tape of brainstorming was then 
played. It was an unrehearsed discussion by four 
male graduate students and dealt with the ques- 
tion “What would be the consequences if suddenly 
everyone could read everyone else’s mind?” and 
it contained many highly original and unconven- 
tional ideas. The tape served to illustrate the range 
and type of ideas a brainstorming session could 
elicit. The Ss then dealt with the following two 
questions in order to get them used to the pro- 
cedure and to each other. (a) Name as many uses 
as you can for a red brick. (b) Name as many 
uses as you can for a wire coat hanger. Each of 
these problems was dealt with for about 2 min. 
Following this the rules of brainstorming were re- 
peated. Half of the groups then worked under a 
combination of group and individual work (GI) 
first. Their time was divided into 10 min. of group 
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brainstorming and 10 min. of individual brain- 
storming. Generally most of the groups’ ideas had 
been expressed at the end of 10 min.; at this time 
they were given a copy of the problem and told 
to relax, read it over and see if they could think 
of any more new ideas or solutions on their own 
(this took place in a separate room for each SS). 
They were urged to be unafraid to duplicate the 
ideas which might have been given in the group 
if they had any uncertainty as to whether or not 
they had been given previously. The Ss worked 
alone for 10 min. and then were reassembled for 
Problem 2. The above procedure was then repeated. 
The combination of group and individual work was 
followed by individual work (II). The Ss were 
assigned to an individual room, given a copy of 
the experimental problem and a stack of lined 
paper headed by the instruction “Be brief. Your 
answers need not be complete sentences.” They were 
then asked to apply the principles of brainstorming 
as individuals. They worked on two problems for 
20 min. each. 

The Ss who were to work in the II condition 
first proceeded immediately from the practice prob- 
lems to the II condition and then on to the GI 
condition. All group sessions were tape-recorded. 
The same E served for all groups, and participated 
only when it was necessary to reprimand criticism 
(which seldom occurred). 

The problems used in this study are the same 
as those used by Dunnette et al. (1963). They are 


Thumbs problem. We do not think this is likely 
to happen, but imagine for a moment what 
would happen if everyone after 1966 had an 
extra thumb on each hand. This extra thumb 
will be built just as the present one is, but lo- 
cated on the other side of the hand. It faces 
inward, so that it can press against the fingers, 
just as the regular thumb does now. Here is the 
question, what practical benefits or difficulties 
will arise when people start having this extra 
thumb ? 


Education problem. Because of the rapidly in- 
creasing birthrate beginning in the 1940’s, it is 
now clear that by 1970 public school enrollment 
will be very much greater than it is today. In 
fact, it has been estimated that if the student- 
teacher ratio were to be maintained at what it is 
today, 50% of all individuals graduating from 
college would have to be induced to enter teach- 
ing. Here is the question. What different steps 
might be taken to insure that schools will con- 
tinue to provide instruction at least equal in 
effectiveness to that now provided? 


People problem. Suppose that discoveries in 
physiology and nutrition have so affected the 
diet of American children over a period of 20 
years that the average height of Americans at 
age 20 has about doubled. Comparative studies 
of the growth of children during the last 5 years 
indicate that the phenomenal change in stature 
is stabilized so that further increase is not ex- 


pected. What would be the consequences? What 
adjustments would this situation require? 

Tourists problem. Each year a great many 
American tourists go to visit Europe. But now 
suppose that our country wished to get many 
more European tourists to come to visit America 
during their vacations. What steps can you sug- 
gest that would get more European tourists to 
come to this country? 


The assignment of Ss to conditions was done in 
a systematic fashion so that one group in each of 
the four orders was completed before any condi- 
tion was repeated. The three groups in each order 
were also matched as closely as possible with regard 
to the time of day during which the experiment 
was conducted. 


Results 


All ideas produced during both the group 
and individual sessions for each problem 
were transcribed. From the transcriptions a 
master list of all the different ideas for each 
problem was constructed. This list was used 
to rate the quality of ideas. All protocols 
(group and individual) were inspected to 
delete duplications within any one session. 
The protocols obtained during the individual 
part of the GI sessions were inspected to 
delete any ideas which had been previously 
expressed in the group sessions. Then the 
four protocols for each group’s II sessions 
were inspected and all the different ideas 
extracted. The first analysis consists of com- 
paring the number of different ideas or solu- 
tions contributed under the II conditions 
with the number contributed under the GI 
conditions. The means for various problems 
and conditions are shown in Table 2. 

A three-way analysis of variance (Winer, 
1962, p. 554, Plan 9) indicated that the only 
significant effect was due to problem sets. 
The Thumbs and Education problems yielded 


TABLE 2 


Mean Totat NuMBER OF DIFFERENT IDEAS AND/OR 
SOLUTIONS TO PROBLEMS BY SS UNDER CONDITIONS 
OF INDIVIDUAL-INDIVIDUAL AND GrouP— 
INDIVIDUAL BRAINSTORMING 





Individual— Group- 

Problems individual | individual 
Thumbs and people 73.34 74.67 
Education and tourists 61.33 56.58 
Total 134.67 13125 
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TABLE 3 


Mean Totat NumsBer oF “GOOD” IpEAs AND/oR 
SOLUTIONS PRODUCED UNDER CONDITIONS OF 
INDIVIDUAL-INDIVIDUAL AND GrouP-INDI- 
VIDUAL BRAINSTORMING 





Individual-— Group- 
Rroblems individual | iadividual 
Thumbs and people 37.00 30.75 
Education and tourists fey 11.33 
Total 48.17 42.08 





fewer responses than the People and Tourist 
problems. 

A second analysis was conducted to find 
out if the number of “good” answers pro- 
duced under II or GI conditions was sig- 
nificantly different. The scales used for rating 
quality of ideas were the same as those used 
by Dunnette et al. (1963) and Taylor et al. 
(1958). The effectiveness scale was used to 
rate the responses to the real problems (Edu- 
cation and Tourists) and the probability 
scale was used to rate responses to the imagi- 
nary problems (Thumbs and People). These 
scales are presented in Appendix A. The 
author and a second judge rated all problems. 
The sum of the two ratings was used as an 
estimate of a response’s quality. Interrater 
reliabilities were computed on approximately 
one-third of the responses to each problem. 


TABLE 4 


MEAN Quality RATINGS FOR IDEAS AND/OR SOLUTION 
PRODUCED UNDER CONDITIONS OF INDIVIDUAL- 
INDIVIDUAL AND GRrouP-INDIVIDUAL 








BRAINSTORMING 
Individual-— Group- 
Problems individual | individual 
Thumbs and people 4.31 3.86 
Education and tourists 2.79 2.85 
Total ia 6.71 





They were .67 and .59 for the Education and 
Tourists problems, and .62 and .55 for the 
People and Thumbs problems. Although not 
high, these reliabilites are adequate and in 
the same range as those reported by Dun- 
nette et al. (1963). Any response with a 
total rating of 5 or better (sum of two 
raters) was considered “good.” The means 
for various problems and conditions are 
shown in Table 3. An analysis of variance 
indicated that the only significant effects 
were due to problem sets. The Thumbs and 
Education problems yielded more good re- 
sponses than the People and Tourists prob- 
lems. 

The next analysis deals with the mean 
quality of ideas and/or solutions. The ap- 
propriate comparisons are shown in Tables 
4 and 5. 


TABLE 5 


ANALYSIS OF VARIANCE: MEAN QuaLity RaTINGs FOR IDEAS AND/OR SOLUTIONS TO PROBLEMS UNDER 
CONDITIONS OF INDIVIDUAL-INDIVIDUAL AND GRoUP-INDIVIDUAL BRAINSTORMING 

















Total (both problems Thumbs and people Education and tourists 
of each set) problems problems 
Source df 
MS F MS F MS BF 
Between individuals 
Order (O) 1 2.43 4.50 1321 8.07* 22 
SXC 1 18 14 00 
SxOxC 1 Boe 13 82 1.86 
Errorp 8 54 TS 44 
Within individuals 
Condition (C) 1 91 8.27* ee 6.05* 02 
Rea O) 1 00 12 wt 2.43 
Set (S) 1 1.06 9.64* 02 78 11.14* 
Sx0O 1 .60 5.45* 00 56 8.00* 
Errorw 8 rie 20 07 


























*> < .05. 





10 Tuomas J. BoucHarn, Jr. 


The II condition was clearly superior to 
the GI condition when both problems of each 
set were combined. The set effect was due to 
higher mean quality scores for the Thumbs 
and Education problems than the People 
and Tourist problems (7.11 versus 6.69). 
The significant Set X Order interaction shows 
that this difference is due to inferior per- 
formance on the People and Tourists prob- 
lems under the order II-GI. 

A separate analysis of the unreal (Thumbs— 
People) and real (Education—Tourists) prob- 
lems of each set is informative. The overall 
condition effect is shown to be entirely a 
function of the superiority of the unreal 
problems under the II condition. The su- 
periority of the order GI-II over the order 
II-GI (4.31 versus 3.86) yielded a significant 
order effect for the Thumbs and People prob- 
lem. The significant Set effect for the Edu- 
cation and Tourist problems was due to the 
higher quality of responses to the Education 
problem (3.00 versus 2.64). The Set < Order 
interaction indicates that this difference was 
due to inferior performance on the Tourist 
problem under the order II-GI. 


Discussion 


The most striking aspect of this experi- 
ment is the failure to find large differences 
between various conditions in terms of num- 
ber of answers as reported by Taylor et al. 
(1958) and Dunnette et al. (1963). Since 
Taylor’s population of Ss was comparable 
to the population sampled in this experi- 
ment, a comparison of some of the data was 
attempted. Table 6 contains these compari- 
sons. The data consist of the mean total 
number of responses to each of three prob- 
lems. The data in parentheses are taken from 
Taylor et al. (1958), page 34, Table 2. 

The other data are from this experiment. 
Each mean represents the mean performance 
of three groups for the first 10 min. of per- 
formance under the appropriate condition. 
All data were taken from the column headed 
“First” in Table 1. For example, the mean 
for the real groups under Tourist problem 
(35.3) represents the mean performance for 
the first 10 min. of Groups J, K, L, under 
GI,2. Table 6 clearly shows that the differ- 
ences between the real groups are very small, 


TABLE 6 


Mean Totat NUMBER OF RESPONSES TO EACH PROB- 
LEM BY REAL Groups AND NomMINAL GROUPS 





Problems 





Condition 


Tourists Thumbs | Education 





(38.4) 35.3 | (41.3) 41.0] (32.6) 27.0 
(68.3) 39.3 | (72.6) 42.7| (63.5) 30.0 


Real groups 
Nominal groups 





Note.—Data in parentheses from Taylor et al., 1958, Table 2 . 


indicating that the brainstorming sessions 
were essentially replications, while the dif- 
ferences between the nominal groups are very 
large. Since Taylor et al.’s conclusion that 
nominal groups are superior to real groups 
across a variety of measures depends on the 
difference in number of responses under each 
condition, the discrepancy in Table 6 is of 
importance. There are a number of possible 
reasons for the discrepancy, such as a dif- 
ferent introductory procedure, a different 
time limit (the present author’s 10 min. 
versus Taylor’s 12 min.), or the fact that Ss 
of this study had a copy of the problem and 
Taylor’s Ss did not. The most crucial pro- 
cedural difference between the experiments 
seems to be that the individual Ss of the 
present experiment wrote their responses 
rather than verbalized them. 

This difference in results and procedures 
parallels the difference between Taylor et al. 
(1958), Dunnette et al. (1963), and Parnes 
and Meadow (1963). Parnes and Meadow 
(1963) report that they had their Ss record 
their own ideas in pencil as they spoke 
them. Apparently the individual Ss wrote 
their answers also since this is normally the 
way the test which they used as a problem is 
administered. On the other hand, Taylor 
et al. (1958) and Dunnette et al. (1963) had 
their Ss verbalize their ideas which were re- 
corded on tape. That this difference in proce- 
dure is not trivial is attested to by the find- 
ings of Horowitz and Newman (1964). They 
analyzed spoken and written responses to two 
different questions and found that ‘Spoken 
expression produces more material (words, 
phrases, sentences), more ideas and subordi- 
nate ideas, more ancillary ideas, communica- 
tive signals, and orientation signals . . . [p. 
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646].” For number of ideas alone it took 10 
min. to write what took only 2 min. to speak. 
This difference is likely to vary depending 
on the type of problem dealt with, but it is 
of such a large order of magnitude that the 
direction is unlikely to be reversed. Horo- 
witz and Newman (1964) report that one of 
the main reasons for the difference between 
conditions is that “To a great extent our 
subjects could not tolerate silence—even a 
10 or 15 second break seemed to create an 
uneasiness which tended to be filled in... 
[p. 647].” Another factor of importance was 
the greater feeling of commitment on the 
part of Ss who wrote their responses. Writ- 
ing seems to have a permanence to it which 
verbalization, even when tape-recorded, does 
not. It seems that verbalization under in- 
dividual conditions may exhaust the individ- 
ual’s response repertoire more rapidly and 
completely than writing, with prolonged 
silences acting as a stimulus to continue to 
verbalize. These considerations lead us to 
believe that the findings of Taylor et al. 
(1958) are not contradicted; nevertheless, 
it should be noted that they may hold only 
under the special conditions where S verbal- 
izes his responses aloud when working alone. 
Had the groups of the present experiment 
written their responses, Taylor et al.’s 
(1958) results would most likely have been 
replicated. Since there are much larger dif- 
ferences in sample and time between this 
experiment and that of Dunnette et al. 
(1963), no specific comparisons can be made. 

Given the specified conditions of this ex- 
periment, verbalization in the group and 
written individual responses, the results indi- 
cate that in terms of total number and num- 
ber of “good” ideas and/or solutions to a 
problem, it does not matter whether Ss work 
as individuals or in a combination of group 
work followed by individual work. This find- 
ing holds for what we consider to be two 
qualitatively different types of problems, un- 
real or imaginary (Thumbs and People) and 
real or concrete problems (Education and 
Tourists). For the criterion mean quality, 
however, individual performance is superior 
to a combination of group-individual work 
on unreal problems. 


There are a number of possible reasons 
why, contrary to our predictions, the com- 
bination of GI work was not shown to be 
superior to II work. One aspect of the prob- 
lem-solving process postulated by many in- 
vestigators, but not considered in this ex- 
periment, is the stage of incubation. It is 
quite possible that an incubation period, or 
simply a short period of time during which 
to think about the problem without any 
pressure to produce answers, between the 
first and last 10 min. of problem solving, 
might benefit the GI condition more than 
the II condition. A second possibility is that 
the participants in the brainstorming groups 
were so busy trying to think of ideas they 
scarcely listened to what others were saying 
and did not make optimal use of the avail- 
able information. It is also possible that 
training would improve the performance of 
the groups more than that of the individuals. 
Experiment II will focus, in part, on the 
second possibility. 


EXPERIMENT II 


The evidence is quite clear that individual 
brainstorming is superior to group brain- 
storming. It is not clear however that group 
brainstorming is superior to traditional group 
problem solving. The Weisskopf-Joelson and 
Eliseo (1961) study suggests it is not. Par- 
loff and Handlon (1964) report that it is, 
only if we consider written but not spoken 
responses. Brilhart and Jochem (1964) re- 
port only marginally significant results in 
favor of brainstorming while using a written 
mode of response. 

One of the difficulties inherent in evalu- 
ating these studies is their failure to make’ 
a clear distinction between the two proce- 
dures and to distinguish between the task 
and social-emotional aspect of group prob- 
lem solving. In the experiment reported be- 
low, critical group problem solving and brain- 
storming are contrasted. The instructions 
have been written with both of these prob- 
lems in mind. A detailed analysis is pre- 
sented in Bouchard (1967). Briefly, group 
critical problem solving is conceptualized as 
a procedure which assumes that the best 
or most relevant solutions to a problem 
should be sought in light of a well-defined 
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goal and criteria of quality and relevance. 
The major problem-solving heuristic used is 
similar to Newell, Shaw, and Simon’s (1960) 
means-ends analysis. In this situation, group 
members help keep the discussion on track 
by criticizing poor and irrelevant ideas. 
Brainstorming, on the other hand, is con- 
ceptualized as a procedure which involves a 
lessening of critical judgment and use of 
primary process or a sheer associative type 
of idea generating mechanism. This process 
is supposedly facilitated by other group 
members who provide unique stimuli which 
constantly prod the associative mechanism, 
thereby generating more and better sug- 
gestions. 

The basic difference between the two types 
of procedures, from a task perspective, is a 
difference in the type of problem-solving 
heuristic used. 

We have attempted to avoid as many po- 
tential interpersonal or social-emotional prob- 
lems as possible by specifying the instruc- 
tions explicitly enough so that they are 
precluded. To the extent that we are success- 
ful any differences between procedures are 
a function of the problem-solving heuristics 
used rather than interpersonal problems. The 
instructions were written in light of Tuck- 
man’s (1965) developmental theory of small 
groups. Their effectiveness has been evalu- 
ated by having Ss fill out a questionnaire 
after finishing work under each procedure. 

Another important consideration is the 
interaction between type of problem-solving 
procedure and interpersonal behavior. Given 
the group situation and the motives charac- 
teristic of people in a group-problem-solving 
situation, can either of the procedures be 
used efficiently? The group-problem-solving 
situation can be described as a set of “recip- 
rocally contingent interactions.” Jones and 
Thibaut (1958) describe this situation in 
terms of the problem of interpersonal per- 
ception. 


In reciprocal contingency. situations, the need for 
information is immediate, and it must be quickly 
processed since neither actor has much time to 
think about the preceding act before having to act 
himself. As a consequence of this “urgency” con- 
sideration, we suggest that much of the perceiver’s 
attentive energy will be directed to his own future 
responses and not the stable characteristics of the 


other. Thus the main moment-to-moment problem 
is not “What is he like?” but “What am I going 
to do next?” [p. 158]. 


Analogously in the group-problem-solving 
situation, the demand characteristics are such 
that the main moment-to-moment questions 
are not “What are they saying?” but “What 
am I going to say next?” and “What will 
they think of what I say?” This type of set 
or attention distribution is likely to interfere 
more with the brainstorming heuristic, which 
emphasizes the stimulus value of others’ re- 
sponses, than with critical problem solving. 
We can circumvent this problem by provid- 
ing task-relevant feedback, during a period 
in which the demands for productivity are 
reduced. This helps to achieve an optimal 
distribution of attention and information 
input. There is also evidence that techniques 
which allow Ss to become familiar with each 
other’s thinking, regardless of whether or 
not they agree with it, facilitate interpersonal 
communication (Triandis, 1960) and crea- 
tivity under some conditions (Triandis, Hall, 
& Ewen, 1965). The task-relevant feedback 
technique meets these requirements. In this 
experiment each problem-solving procedure 
is divided into a feedback and nonfeedback 
condition. Feedback consists of having Ss 
listen to a taping of their first 5 min. of 
performance on a problem, and then allowing 
them to continue to work on the same prob- 
lem. 

In order to evaluate the relative effective- 
ness of group brainstorming and critical 
group problem solving, “nominal” brain- 
storming groups were also formed. On the 
basis of past research, the ‘‘nominal”’ brain- 
storming groups are expected to be superior 
to the brainstorming and critical problem- 
solving groups. The feedback condition 
should be superior to the nonfeedback con- 
dition. 

No prediction is made for the brainstorm- 
ing and critical problem-solving groups, as 
such, but the feedback condition for each 
should be superior to the nonfeedback con- 
dition. The brainstorming group should ben- 
efit from feedback more than the critical 
problem-solving group. There should be a 
main effect due to feedback across all pro- 
cedures. 
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Method 
Subjects 


The Ss were 144 male students drawn predomi- 
nantly from lower division psychology courses in 
which participation in 3 hr. of experimentation 
was a course requirement. 


Design 


The experimental design, a 3 X2X2_ repeated- 
measures design is given in Table 7 (Winer, 1962, 
p. 337). Due to time limitations only two prob- 
lems were used (Thumbs and Education). Since 
each group worked on both problems (C factor), 
the order of solving the problems was counter- 
balanced within each A X B level. Pilot work had 
demonstrated that the time of day during which 
problem-solving sessions occurred was an impor- 
tant factor. Therefore it was systematically coun- 
terbalanced within problem order and condition. 


Procedure 


The instructions for each of the procedures are 
as follows: 


Critical problem-solving instructions: This is 
an experimental study of group problem solving. 
Most of you have never worked on a problem 
in this way, so I will go over the procedure 
with you. This technique is a form of group 
interaction which is used to facilitate the flow 
of ideas. It is widely used in a large number of 
United States corporations. It is generally used 
when new, unique, original, and creative ideas 
are desired. It is not used to solve everyday 
problems. 


The procedure is relatively straightforward and 
easy to comprehend. The rules are as follows: 


1. The problem is analyzed. You should ask 
yourselves, both alone and as a group: “What 


TABLE 7 


EXPERIMENTAL DESIGN 























Ci C2 

by Gi Gu 
ai 

bs Gi Gi 

bi Go Ga 
ae 

be Goo Gos 

by Gs Gs; 
a3 TT 2 kee 

bo G32 Gz. 








Note.—a1 = Brainstorming; a2 = Critical problem solving; 
a; = Individual brainstorming (nominal groups); bi = No 
feedback; by = Feedback; c1 = Thumbs problem; cz = Educa- 
tion problem; G = six groups. 


are its 
“Why ?” 

2. You should try to determine, and keep in 
mind, what kind of criterion a solution should 
meet to be worthwhile. What defines a good 
solution? 

3. In light of these two points, you should 
try to come up with as many good ideas as 
you can. By good ideas I mean if someone asks 
you to defend or explain what your suggestion 
means, you should be able to do so. It should 
also be meaningful and useful, not trivial. Criti- 
cism is acceptable but it should be directed only 
at ideas, not people or style of delivery. It 
should be pertinent and meaningful. 

4. Everyone should contribute to the problem. 
Don’t hesitate to speak up. As well as con- 
tributing ideas, individuals should also discuss 
what lines they are thinking along. In other 
words, we want you to maximize the exchange 
of information among yourselves. For example, 
you might mention what areas seem to you to 
be most productive of solutions, what kinds of 
assumptions are valid, etc. Let others know what 
you are thinking. Don’t be afraid to say what 
you think. 

I am going to tape-record the problem-solving 
session and take a few notes. Don’t let this 
distract you. 


implications?” “What is happening?” 


If the group was designated a nonfeedback 
group, the instructions continued. “You will work 
on a problem for 20 minutes. After that you will 
work on a second problem for 20 minutes.” 

If the group was designated as a feedback group, 
the instructions were: 


After you have worked on a problem for 5 
minutes, I will stop you and we will take a 
break. During this time I will replay the tape. 
This will give you some idea of how you are 
doing and give you a chance to digest the ideas 
already put forth. It may perhaps suggest more 
new ideas. After the tape has been replayed, you 
will work on the problem for 10 more minutes. 
After that we will work on a second problem 
in the same way. 


After this, both groups were asked “Are there 
any questions?” If there were questions the in- 
tructions were appropriately elaborated. Follow- 
ing this, the instructions were read through once 
again, and the problem was read to the group. 
If any S requested it, the problem was reread dur- 
ing the problem-solving procedure. 


Group brainstorming instructions. The instruc- 
tions for group brainstorming were identical to 
those for critical group problem solving except 
Steps 1, 2, and 3 were replaced by: 


1. Criticism is ruled out; adverse judgment of 
ideas must be withheld. No one should criticize 
anyone else’s ideas. 

2. Freewheeling is welcome; the wilder the 
idea the better. It is easier to tame down than 
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to think up. Don’t be afraid to say anything 
that comes to mind, the farther out the idea the 
better. This will stimulate more and better ideas. 

3. Quantity is wanted; the greater the num- 
ber of ideas, the more likelihood of winners. 
Come up with as many as you can. 

4. Combination and improvement are sought. 
In addition to contributing ideas of their own, 
participants should suggest how ideas of others 
can be turned into better ideas, or how two or 
more ideas can be joined into still better ideas. 


Individual brainstorming instructions. The intro- 
duction to the instructions for individual brain- 
storming were the same as for group brainstorming. 
The following instructions were given: 


The following rules are for groups. You will be 
working alone; nevertheless, I want you to 
apply these rules as best you can while working 
on these problems. What we are interested in 
is whether or not an individual can brainstorm 
and how he does it. The rules are as follows: 


1. Criticism is ruled out; adverse judgment of 
ideas must be withheld. This is clear for a group. 
For an individual, it means don’t criticize any 
idea that comes to mind. Say everything you 
think of. 

2. Freewheeling is welcome; the wilder the 
idea the better. It is easier to tame down than 
to think up. Don’t be afraid to say anything 
that comes to mind. The farther out the idea 
the better. It will stimulate more and better 
ideas. 

3. Quantity is wanted; the greater the num- 
ber of ideas, the more likelihood of winners. 
Come up with as many as you can. 

4. Combination and improvement are sought. 
In a group, subjects are told they should suggest 
how ideas of others can be joined into still better 
ideas. For an individual this means that you 
should be willing to change suggestions you have 
made. Don’t be afraid to combine and improve on 
them. 

5. In a group, subjects are told “Everyone 
should contribute to the problem. Don’t hesitate 
to speak up. As well as contributing ideas, in- 
dividuals should also discuss what lines they are 
thinking along. For example, you might mention 
what areas seem to you to be most productive 
of solutions. Let others know what you are 
thinking. Don’t be afraid to say what you think.” 
In other words, they are asked to maximize the 
exchange of information among themselves. For 
an individual this means don’t be afraid to 
mention things that come to mind other than 
straightforward suggestions. Talk about what 
you're thinking about. Don’t be afraid to speak 
up. 

The rest of the instructions were the same as for 
group brainstorming, and varied according to 
whether or not there was feedback. 

Note that in this experiment, as opposed to 
Experiment I, responses under the individual brain- 


storming conditions were tape-recorded. The author 
was the £ and was present during all group- 
problem-solving sessions. He was not present dur- 
ing individual problem solving. The £ did not par- 
ticipate and interrupted only when the procedural 
rules were not followed. This seldom occurred. 

After working on both problems, Ss were asked 
to fill out a questionnaire which is presented in 
full in Appendix B. The questionnaire attempted 
to assess how well the instructions succeeded in 
equating the groups in terms of perceived critical- 
ness, cohesion, satisfaction, and effectiveness. 


Scoring 


Both group and individual problem-solving ses- 
sions were transcribed in toto. All protocols were 
then inspected by two judges who removed dupli- 
cate ideas within group or individual sessions, 
marked each remaining idea or suggestion, and 
cross-checked each other. This analysis yielded the 
criterion—total number of ideas. In order to obtain 
a quality criterion, all ideas were rated for good- 
ness using the following scales. 

For the Thumbs problem a practicality-importance 
scale (see Appendix A) was used. Experiment I as 
well as previous studies had used a _ probability 
scale to rate this problem (Dunnette et al., 1963; 
Taylor et al. 1958). Reliabilities were fairly low, 
however, and it was felt that the scale did not 
grasp the dimensions of practicality or value, and 
so it was not used. 

For the Education problem the same effectiveness 
scale as was used in Experiment I was employed. 
The effectiveness scale, although better than the 
probability scale, had also yielded poor reliabilities 
in past studies. It was felt that this was due to 
the fact that many ideas were difficult to rate out- 
side of the context in which they were presented. 
In order to avoid this problem and minimize EZ 
bias, the ideas were judged in the following way. 
All ideas were numbered on the original protocol, 
then the protocols of the various conditions were 
systematically mixed. Each judge then rated one 
idea per page, across all conditions, by writing 
each idea number and its value on a small slip of 
paper until all ideas had been rated. If reading 
comments prior to the idea- facilitated the rating, 
the judge did this. One judge was aware of the 
experimental condition from which some of the 
protocols were taken but the second was not. 

The sum of both judges’ ratings was used as 
the criterion. The intraclass correlation coefficient 
was used to assess the reliability of the two judges 
(Winer, 1962, p. 124). The reliabilities for four 
samples of 50 ideas were .77, .55, .70, .74 for the 
Education problem and .89, .67, .72, .73 for the 
Thumbs problem. The reliabilities are higher than 
those previously reported for these problems and 
are in an acceptable range..In order to improve the 
criterion further and reduce possible E bias to a 
minimum, both judges, together, went over all 
ideas with a discrepancy or 2 or more and recon- 
ciled all the discrepancies. The sum of both judges’ 
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ratings was taken as the index of the idea’s value. 
A “good idea” was one with a value of 4 or 
more for the Thumbs problem and a value of 6 or 
more for the Education problem. These lower limits 
were selected because a stricter criterion would 
have yielded zero scores for a number of groups. 
In both cases the lower limits are at the median of 
the distribution for summed ratings of a large 
sample (about 500) of the various conditions. The 
Ss who brainstormed individually were combined 
into nominal groups by combining the scores of 
any four Ss who had worked at the same time of 
day, and deleting overlapping ideas. 


Results 


Before presenting the results of the ex- 
periment on the dependent variables, the 
effectiveness of the instructions by means 
of the questionnaire will be evaluated. The 
instructions were explicitly designed to 
equate the groups as much as possible in 
terms of cohesiveness, motivation, lack of 
conflict, and criticalness with respect to self. 
There were no significant differences between 
feedback and nonfeedback groups within the 
procedures on any questions; these condi- 
tions, therefore, were collapsed. There were 
significant differences between procedures on 
only a few questions; nevertheless, the abso- 
lute values of the ratings for each of the 
procedures are of some interest and the 
results for all questions are reported in Table 
8. 


If one conceives of cohesiveness in terms 
of attraction to the group (Cartwright & 
Zander, 1960, Ch. 3), Questions 1 and 2 tap 
this dimension from two different stand- 
points. Responses to Question 1 indicate that 
Ss in the critical problem-solving group en- 
joyed working in their group significantly 
more than the brainstorming Ss. This finding 
runs counter to the intuitive impression of 
the E and nonsystematic reports from other 
experiments (Taylor et al., 1958). Note, 
however, that in both groups the mean rating 
is favorable. In response to Question 2, which 
could be taken as a measure of commitment 
to the group or motivation to work, there was 
no difference between conditions but the 
means were very low. This seems to indicate 
that S’s commitment to the group qua group 
was not very high. Responses to Question 3 
indicate that Ss in the critical condition had 
a greater commitment to the task, from a 
self-oriented standpoint, than Ss in either of 
the other two conditions. The differences are 
small and their absolute magnitude indicates 
that in all conditions Ss were neutral rather 
than favorable or unfavorable. Participants 
in the critical procedure also enjoyed work- 
ing on the problems more than either of the 
other two procedures (Question 4). Again 
the differences are small. Participants in 
both brainstorming and critical problem- 





TABLE 8 
MEANS AND STANDARD DEVIATIONS FOR QUESTIONNAIRE RATINGS BY ALL THREE PROCEDURES 
Beer cegenai Critical problem Individual 
Question fa asco t-test® solving t-test brainstorming t-test 
no. =< 1 vs. 2 2 vs. 3 3 vs. 1 
M SD M SD M SD 
1 6.27 1.88 01 7.69 1.39 nae na 
2 Shao 2.44 3.96 eS na na 
3 4.48 2.64 05 5.67 2,29 05 4.50 2.34 
4 5.15 DAs 05 6.23 Delhi 5.66 2.19 
5 5.42 2.04 55 2:22, na na 
6 5.92 1.97 DehZ 1.70 na na 
7 6.21 125 5.96 1.46 6.54 1.56 
8b 6.04 eee 5.96 1.34 6.46 2.04 
9 6.13 2.68 7.02 2.30 O1 4.65 Deaf 01 
10 head 2.29 7.29 2.41 na na 
11 7.48 1.87 7.06 eos na na 


8 All tests two-tailed. 








b Feedback Ss only, N = 24, in all other comparisons N = 48. 


° Not asked. 
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solving groups expressed the same degree of 
satisfaction with self or group performance 
(Questions 5 and 6). Again the means fell 
at a neutral point. 

All Ss who worked under feedback condi- 
tions in each of the procedures agreed on 
their judgment that feedback helped, and 
there were no significant differences between 
groups’ judgments on this point (Question 
7). This is of interest since it will be shown 
later that feedback was actually a hindrance 
to the critical groups. Apparently the feed- 
back also made Ss somewhat more com- 
fortable (Question 8). The ratings of per- 
ceived effectiveness indicate that individuals 
in the critical groups felt that their proce- 
dure was more effective than did those who 
engaged in individual brainstorming (Ques- 
tion 9). Surprisingly, this is just the oppo- 
site of the actual results. The absolute value 
of the means indicates that the critical groups 
perceived their procedure as effective while 
the individual brainstormers perceived theirs 
as ineffective. The Ss in both the critical 
and brainstorming groups reported being 
rather nervous, but they do not differ in 
this respect (Question 10). Neither of the 
groups reported feeling that other group 
members were critical of them (Question 11), 
and they do not differ significantly on this 
question. 

The results for the three criteria, total 
number of ideas, number of good ideas, and 
mean number of ideas are presented in 
Tables 9, 10, and 11. The comparison be- 
tween procedures consists of 20 min. of work 
under the nonfeedback condition and 15 min. 
of work and 5 min. of listening under the 
feedback condition. The comparisons are 
therefore in terms of equal man hours on 
the task, regardless of how the time was 
used. A comparison involving an equal 
amount of work time on the task will be 
presented later. 


Total Number of Ideas and Number of 
Good Ideas 
For a total number of ideas and number 


of good ideas, the main effect for procedure 
is significant at the .01 level (Table 11). 


TABLE 9 


Means ror Main Errects—20 Minutes or Work 





























es Total No. Mean 
Condition : 

no. good | quality 
Brainstorming 29.8 14.1 4.31 
Critical problem solving 21.0 9.9 4.47 
Individual brainstorming | 47.0 Al 4.40 
Feedback S12 14.6 4.45 
No feedback 34.1 15.9 4.33 
Thumbs problem 40.3 20.3 3.75 
Education problem 24.9 10.1 5.04 


The overall means (Table 9) for brain- 
storming, critical problem solving, and in- 
dividual brainstorming are significantly dif- 
ferent from each other at the .01 level by 
the Newman-Keuls test (Winer, 1962). By 
the same test individual brainstorming under 
both feedback and nonfeedback conditions 
is clearly superior (.01) to any of the other 
conditions (Table 10). Contrary to the 
author’s prediction, there is no main effect 
due to feedback. The Thumbs problem elicits 
more responses than the Education problem, 
There are no significant interactions. In 
order to make a more sensitive test of dif- 
ferences between feedback and nonfeedback 
conditions under the brainstorming and criti- 
cal problem-solving procedures, the individ- 
ual brainstorming condition was dropped 
from the analysis and the ANOVA recalcu- 
lated. See Table 12. 

The main effect for procedure is still sig- 
nificant at the .01 level. A test of simple main 
effects, which are presented in Table 10, in- 
dicates that the difference between the brain- 
storming groups and critical problem-solving 
groups under the nonfeedback condition falls 
short of significance (p < .10). As predicted, 
the difference between the feedback condi- 
tions is significant (p< .01). The difference, 
however, is not due to improved perform- 
ance under the brainstorming conditions, but 
rather to poor performance under the criti- 
cal problem-solving conditions. There are no 
significant differences between the feedback 
and nonfeedback groups within any of the 
procedures. 
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TABLE 10 
MANS FOR Stmpie MAIN Exrncts—20 Minutes or Wonk 





ee — — — eee 

















Group brainstorming Critical problem solving Individual brainstorming 
Criterion S| | Sa 
No feedback | Meedback | No feedback | Feedback | No feedback | leedback 
‘Total both problems 
Total no. ideas 30.3 29.3 23,7 18.3 48,2 45,9 
(25.5) (19.4) (41,1) 
No. good ideas 13.9 14.3 10,0 8&8 22.8 20.6 
(12,2) (9,7) (19,7) 
Mean quality ideas 414 4.48 Add 4,50 AAS 4,47 
(4.21) (4.41) (4,49) 
‘Thumbs problem 
Total no, ideas Stic 38.3 31,0 21.4 $7.3 55.6 
(31.3) (25,4) (49.0) 
No. good ideas 18.5 19,7 15.2 11.4 29,7 27.3 
(16,0) (14,0) (25,9) 
Mean quality ideas 3.64 3,82 3.63 3,92 3.77 3.68 
(3.68) (3.72) (3,84) 
Ndueation problem 
Total no. ideas 23.1 20.3 16,1 15.1 38,7 35,9 
(19,7) (13,5) (33,2) 
Total no, good ideas 9.3 8.8 6.7 6.3 15.8 13,8 
(8.3) (5.3) (13,5) 
Mean quality ideas 4.03 5,14 5,24 5,07 5,07 5,07 


(4.75) (5.10) . (5.13) 


Note.—Numbers in parentheses are scores for the firat 15 min, of work under the No Meedback conditions, 





TABLE 11 


Summary ANALYsIs OF VARIANCE ror ‘ToraL Numine or Tonas, Nomi o© Goon Lomas, 
AND Mian Quanity oF TpnAge20 Minutis on Work 

















Total no, ideas No, good ideas Mean quality of ideas 
Source of variation —_—_—_—_———_—— ee 
MS i K 
- Between Ss 35 
Procedure (A) 2 4209.29 50,59" 857,04 
Feedback (B) 1 147,35 ee 30,68 
AXB 2 29,85 12.95 
Ss within groups Error), 30 83,19 16,25 
Within Ss 30 
Problems (C) 1 4247.35 106,32" 1850.44 64,97* 
AXC 2 114,44 2.84 71,85 
BXC 1 1400 2.35 
AXBXC 2 63.71 1,00 10,18 


C X Sswithin groups Error | 30 39,95 23.23 





*) < 01. 


TABLE 12 


SuMMARY ANALYSIS OF VARIANCE FOR TOTAL NUMBER 
or InEAS, NuMBER OF Goop IpbrAs—20 MINUTES 
oF Work, INDIVIDUAL BRAINSTORMING 





























EXCLUDED 
Total no. ideas | No. good ideas 
Source of variation | df 
MS F MS F 
Between Ss 23 
Procedure (A) 1 936.34 | 13.06* | 212.52 | 14.50* 
Feedback (B) 1] 120.34] 1.68 9.76 
A XB 1 56.32 16.95 | 1.16 
Ss within groups 
Errorb 20 71.68 14.66 
Within Ss 24 
Problems (C) 1 | 2160.09 | 54.45* | 841.69 | 34.45* 
AXC 1 90.24 Dah 31.69 1.30 
B XC 1 18.74 1.95 
AXBXC 1 114.10 2.88 20.59 
C X Ss within 
groups Errorw 20 39.67 24.43 
*p <.01. 


Mean Quality of Ideas 


For mean quality of ideas, the only sig- 
nificant main effects are due to differences 
between problems. The Education problem 
receives a higher mean rating than the 
Thumbs problem. There are no simple main 
effects or interactions. 

A second way of looking at these results 
is to compare the various conditions in 
terms of amount of time spent working di- 
rectly on the problems. The feedback group 
spent their second 5 min. listening to the 
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TABLE 13 

Means For Main Errects—15 Minutes oF WorK 
Sev acts Total No. Mean 
Condition : 

no. good | quality 
Brainstorming 27.4 132 4.35 
Critical problem solving 18.9 9.3 4.44 
Individual brainstorming] 43.5 20.1 4.43 
Feedback 28.7 13.8 4.37 
No feedback led 14.6 4.44 
Thumbs problem 36.9 19.0 3.78 
Education problem 23.0 9.4 5.04 














tape and were not allowed to verbalize any 
responses during this time. Thus they spent 
only a total of 15 min. on the problem. In 
order to equate amount of time spent on 
the problem, the last 5 min. of perform- 
ance were dropped from all of the nonfeed- 
back groups. The means for these compari- 
sons are given in Tables 10 and 13. The 
ANOVA is given in Table 14. 


Total Number of Ideas and Number of 
Good Ideas 


The main effects for procedures are sig- 
nificant at the .01 level (Table 14). All three 
procedures are significantly different from 
each other at the .01 level. The Thumbs 


TABLE 14 


SuMMARY ANALYSIS OF VARIANCE FOR TOTAL NUMBER OF IDEAS, NUMBER OF Goop IDEAs, 
AND MEAN Quatity oF IpEAS—15 Minutes or Work 























Total no. ideas No. good ideas Mean quality of ideas 
Source of variation df Eee 
MS F MS F MS F 
Between Ss 35 
Procedure (A) 2 3753.18 50:4.1** 727.10 44,44** 65 Dei 
Feedback (B) 1 115.02 1.54 9.49 10 
AXB 2, 59.10 12.88 2 
Ss within groups Error, 30 74.45 16.36 .30 
Within Ss 30 
Problems (C) 1 3486,13 99.18** 168.20 Tho 28.69 58:55" 
Aa 17255 3.34* 56.29 03 
Bix 1 11.68 E29 01 
DOXe Oa 2 60.00 ea 18.51 21 
C X Ss within groups Errory 30 Gols 21.64 49 
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problem elicits significantly more responses 
than the Education problem. The significant 
Procedure X Problem interaction for total 
number of ideas is due to the fact that the 
critical problem-solving groups do very 
poorly on the Thumbs problem relative to 
the other problem and procedures. Individual 
brainstorming under both feedback and non- 
feedback conditions is again clearly superior 
to any other condition. When the individual 
brainstorming condition and test for simple 
main effects are dropped, one finds, as in the 
previous analysis, that the brainstorming and 
critical problem-solving procedures do not 
differ under the nonfeedback conditions. 

In order to assess the effects of feedback 
and procedures more directly, performance 
under each of the group-problem-solving 
procedures was divided into 5-min. periods. 
Both problems were combined in order to 
increase the stability of the measures. The 
profiles for total number of ideas and num- 
ber of good ideas are quite similar so only 
the latter are presented. The data are shown 
in Figures 1 and 2. The analysis of variance 
Winer, 1962, p. 302) is presented in Table 
15. In the contrast for the nonfeedback con- 
ditions, the only significant variation be- 
tween the slopes is in the first 5 min. of 
work (p< .05). The main effect for proce- 
dures is not significant as in the previous 
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Fic. 1. Number of good ideas produced under the 
brainstorming and critical problem solving, nonfeed- 
back conditions, across quarters. 
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Fic. 2. Number of good ideas produced under the 
brainstorming and critical problem solving, feedback 
conditions, across quarters. 


analysis. The Procedure X Quarter inter- 
action is not significant. This indicates that 
under both procedures Ss run out of ideas 
at the same rate. When the feedback condi- 
tions are contrasted, a significant main effect 
as in the previous analysis and a significant 
Procedure X Quarter interaction are found. 
A test of simple main effects indicates that 
the profiles differ significantly at both the 
first quarter (p< .01) and at the second 
quarter (p < .05). It is clear that the sig- 
nificant difference between procedures fol- 
lowing feedback is due to its detrimental 
effect on the critical problem-solving pro- 
cedure. This conclusion is reinforced by the 


TABLE 15 


SumMMARY ANALYSIS OF VARIANCE OF NUMBER OF GooD 
IDEAS FOR BRAINSTORMING AND CRITICAL PROBLEM 
SOLVING UNDER BOTH FEEDBACK AND NOnN- 
FEEDBACK CONDITIONS OVER QUARTERS 


Feedback Nonfeedback 
Source of variation df 
MS F MS F 
Between Ss 11 
Procedure (A) 1} 38.02 | 19.47* | 28.52] 2.71 
Ss within groups 10 4.52 10.52 
Within Ss 36 
Quarter (B) 3 | 297.97 | 55.90* | 140.19 | 22.18* 
Ax 3 27.81 5.22* 8.19 1.30 
B X Ss within groups | 30 5.33 6.32 


*p <.01. 
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fact that the critical groups tend to compen- 
sate for this inferior performance by doing 
somewhat better during the third quarter. 
Both of these factors are responsible for the 
significant interaction. 


Discussion 
Induction of Instructions 


One of the main purposes of the experi- 
ment was to test the heuristics, described 
earlier, as adequately as possible, while con- 
trolling for important interpersonal factors 
in small groups. This was attempted pri- 
marily by a very careful specification of in- 
structions. 

Were the instructional sets adequate? Con- 
sidering the results in Table 8, it is clear 
that the critical groups showed somewhat 
greater cohesiveness, motivation, enjoyment, 
and perhaps more satisfaction with their 
procedure than the brainstorming groups. 
They did not feel they worked in a more 
critical atmosphere nor did they report feel- 
ing more nervous than the brainstorming 
groups. From a group dynamics standpoint, 
then, it could not be argued that their per- 
formance would be more disrupted by inter- 
personal factors than it would be in brain- 
storming groups. The reverse is more likely 
to be true. Since the instructions were de- 
signed to forestall problems in these areas, 
particularly in the critical problem-solving 
groups, we feel we were very successful in 
establishing the atmosphere we had hoped 
to achieve. 


Performance 


As predicted, individual brainstorming un- 
der both feedback and nonfeedback condi- 
tions is by far superior to either group brain- 
storming or critical problem solving under 
feedback or nonfeedback conditions. This 
finding clearly confirms the studies cited 
earlier. It should be noted, however, that it 
sheds no light on the possibility that training 
might improve performance under these con- 
ditions, such that group work would be equal 
or superior to individual work. Second, we 
may be contrasting groups of a size that 
maximize the differences between individual 
and group work. There may be more rapidly 


diminishing returns as N increases in nomi- 
nal groups than in real groups (see also, 
Thomas & Fink, 1963, and Utterback & 
Fotheringham, 1958). The data in this ex- 
periment suggest that this may very well be 
the case. Both of these possibilities need 
to be systematically assessed before any 
closure on the question of group versus in- 
dividual work is reached. 

The failure to find a significant effect due 
to feedback either as an interaction or as 
a main effect runs counter to the expecta- 
tions of this study. A detailed analysis re- 
veals that, if anything, feedback is a detri- 
mental procedure for critical problem-solv- 
ing groups. For brainstorming both individ- 
ual and group, it appears that feedback in 
the form used here can be as effective as 
continuous work, but no more so. 

The significant overall difference between 
critical problem solving and brainstorming 
masks a number of important facts which 
become apparent when the data are analyzed 
in detail. The curves in Figures 1 and 2 
suggest that the major difference between 
critical problem solving and brainstorming 
occurs during the first 5 min., then the 
curves tend to converge. As noted earlier, 
number of ideas is a function of the limited 
number of possible responses. Note, how- 
ever, that the curves do not cross. In the 
nonfeedback condition the brainstorming 
groups do consistently, but not significantly, 
better over every period. In the feedback 
condition the critical groups show the nega- 
tive effects of feedback during the second 
period. In conjunction with the experiments 
cited earlier, it seems fair to conclude that 
brainstorming is superior to critical problem 
solving both when responses are written and 
taped, but this superiority is rather small in 
size. 

If we extrapolate to the practical situation 
where group procedures often must be used, 
these results have a number of implications. 
First, if all one wants is a fixed number of 
good ideas, and this number is not too large, 
brainstorming is no better than critical prob- 
lem solving and probably not worth the 
trouble. Second, if the payoff:cost ratio is 
small, that is, if just a few more good ideas 
could be very valuable, it is preferable to use 
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brainstorming. These conclusions hold for 
both the equal man-hour and equal work- 
time comparison. If one takes into account 
that the brainstorming procedures were novel 
and somewhat strange to the Ss (none of 
them had seen or heard more than a periph- 
eral discussion of brainstorming), it may be 
that these conclusions are too restrictive. 
That is, training may help brainstorming 
groups more than critical problem-solving 
groups. Also, as with the comparison be- 
tween group and individual brainstorming, 
these conclusions may have to be seriously 
modified for large groups. Group size and 
amount of training are two variables that 
should be systematically varied in future 
studies. Proponents of brainstorming have 
always suggested larger groups than those 
used in this and previous experiments, and 
it seems plausible that brainstorming would 
be more effective than critical problem solv- 
ing in such groups. 

The results of the feedback—nonfeedback 
manipulation are of some importance in spite 
of the lack of an overall significant differ- 
ence or significant interaction. They suggest 
that, if anything, feedback is a detriment 
to critical problem-solving groups. For brain- 
storming, both individual and groups, it ap- 
pears that feedback in the form used can 
be as effective as continued work, but not 
more effective. More important, however, is 
the fact that real problem-solving groups 
invariably receive some form of feedback at 
some time or other. If these results have any 
generality with respect to various types of 
-feedback (ours could be characterized as 
both task and socially relevant) and various 
sources of feedback (e.g., from people other 
than group members), then brainstorming 
may be the preferred procedure on this basis 
also. 


PERSONALITY DATA 


- Prior to participation in the problem-solv- 
‘ing part of the experiment, Ss were tested 
for 1 or 2 hr. They were given the California 
“Psychological Inventory (CPI) (Gough, 
~ 1957), the Gough-Sampson College Vocabu- 
lary Test (Ss in Experiment I did not take 
this test) (Gough & Sampson, 1954), and 
the Firo-B Scales (Schutz, 1958). In Ex- 


periment II only some Ss took the Myers- 
Briggs type indicator (Myers, 1962). The 
type indicator was administered in an un- 
orthodox fashion. All items not scored on 
the current scales were crossed out in the 
test booklet and Ss were told to ignore them. 
This procedure may have influenced the 
results in some unknown fashion. 

The data from Experiment I consist of 
the Ss in Groups D, E, and F (see Table 1) 
plus 11 replications. Five Ss were dropped 
from the analysis for either failing to take 
the personality tests or for invalidating them. 

Correlations between personality variables 
and performance, as measured by number of 
good ideas, are reported in Table 16. The 
first 18 variables in Table 16 are from the 
CPI. Variables 19-24 are from the Firo-B, 
Variables 25-28 are from the Myers-Briggs 
type indicator, and Variable 29 is the College 
Vocabulary Test. Variables 30-34 repre- 
sent five factors extracted from a factor 
analysis of the CPI. 


Factor Analysis 


The factor analysis is reported in Table 
17 and was obtained from a normal Vari- 
max rotation with 1’s in the diagonals (Kai- 
ser, 1959) and included all Ss (N = 194). 
Factor I, Interpersonal Effectiveness, is de- 
fined primarily by the first five scales of 
ines bie Co mCs Sy, op, and sa). These 
scales reflect dominance, poise, ascendency, 
and self-assurance. This factor corresponds 
closely to the conventional extroversion fac- 
tor (Sy) but it also reflects efficient intellec- 
tual functioning (see Gough, 1957, p. 36) 
and high interpersonal effectiveness (Do, Sp, 
Sa). These two facets of behavior are not 
generally implied in the concept of extro- 
version which tends to focus primarily on 
social and participative aspects of behavior. 
Factor II, Adjustment, is defined by the 
Good Impression, Self-Control, Achievement 
via Conformance, Responsibility, and Sense 
of Well-Being scales. This factor reflects 
good adjustment in the form of maturity, 
social responsibility, and motivation to 
achieve in conventional ways. Factor III, 
Intellectual Functioning, is defined by the 
Flexibility, Achievement via Independence, 
Tolerance, Psychological-Mindedness, and In- 
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TABLE 17 
Factor ANALysis oF CPI Scates—OrTHOGONAL Factor LOADINGS FOR VARIMAX ROTATION 
Percent communality and loadings for 5 factors 
Variables Interpersonal Adi Intellectual Conven- Ne 
effectiveness és a a functioning tionality ee 
30.60 : 17.61 13.46 ; 

1. Dominance 83 23 — .08 Bl) 04 
2. Capacity for Status 74 alo 30 — .09 — .02 
3. Sociability .89 LOW O01 aS —.15 
4. Social Presence 719 —.08 oS —.05 — .28 
5. Self-Acceptance 88 —.10 04 06 —.02 
6. Sense of Well-Being 28 67 il Lil — .28 
7. Responsibility .06 70 01 oy 31 
8. Socialization — .06 48 —.10 69 G4 
9. Self-Control —.37 83 .20 15 02 
10. Tolerance 8} 56 61 21 —.19 
11. Good Impression mS 87 09 —.10 01 
12. Communality aly 08 — .04 86 00 
13. Achievement via Conformance 28 72 06 36 —.01 
14. Achievement via Independence 06 32 sith 16 11 
15. Intellectual Efficiency 52 38 50 24 —.18 
16. Psychological-Mindedness 24 38 50, —.20 —.07 
17. Flexibility 02 —.25 78 —.30 04 
18. Femininity —.18 .03 03 = .02 92 








tellectual Efficiency scales. This factor re- 
flects a flexible, tolerant, independent, and 
efficient mode of intellectual functioning. 
Factor IV, Conventionality, is defined by the 
Communality and Socialization scales. It re- 
flects a tendency to adhere strictly to the 
rules (Socialization). Factor V, Femininity, 
is defined exclusively by the Femininity 
scale. The high end of the Fe scale reflects 
feminine interests (e.g., appreciative, pa- 
tient, helpful, sympathetic). For males a high 
Fe score reflects an openness and willingness 
to admit unconventional interests and feel- 
ings. 

In Experiment I the procedure, scoring, 
and sample of Ss were slightly different from 
Experiment IT. Nevertheless the group brain- 
storming condition in Experiment I and the 
group brainstorming nonfeedback condition 
in Experiment II are similar enough to con- 
sider the second a cross-validation of the 
first. 


Individual Brainstorming 


The Ss in Experiment I were tested under 
both group and individual conditions in that 
order. It was felt that this may have resulted 











in a considerable carry-over from the group 
to the individual situation. Also the individ- 
ual Ss in Experiment I wrote their responses. 
If the correlations for the individual condi- 
tion in Experiment I are compared with the 
individual brainstorming nonfeedback con- 
dition, one sees that there was considerable 
shrinkage for Dominance, and Self-Accept- 
ance, and a reversal for Expressed Control, 
the three variables having the highest cor- 
relations with the criterion in the original 
sample. This shrinkage may be due to either 
(a) a carryover effect in the original sample 
which did not occur in the replication or (0) 
the fact that Ss in the replication sample 
verbalized rather than wrote their responses. 
In view of the clear-cut cross-validation of 
the group findings reported below, the former 
explanation seems to be the most plausible. 
Under the feedback condition no personality 
variables are significantly related to per- 
formance. 


Group Brainstorming 


Under the brainstorming nonfeedback con- 
dition the first five scales of the CPI cross- 
validate from the original brainstorming 
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study. The ‘Tolerance and Intellectual Ef- 
ficiency scales drop somewhat, but the Ex- 
pressed Inclusion and Expressed Control 
scales of the Viro-B, the Extroversion-Intro- 
version scale of the Myers-Briggs type indi- 
cator, and Factor I (Interpersonal Effective- 
ness) of the CPI cross-validate very well. 
The judging-perceiving scale was not signifi- 
cant in the cross-validation sample, but it 
did cross-validate, because the magnitude of 
the correlation was the same in both samples. 
Two variables which are highly related to 
performance in the replication sample, but 
not in the original sample, are the Wanted 
Control scale of the Firo-B and the Senga- 
tion-Intuition scale of the type indicator, 
Both of these scales had been expected to 
correlate with performance on the original 
sample and we were surprised when they did 
not, The pattern of correlations between the 
Sensation-Intuition scale and other variables 
was compared in both samples, The patterns 
were very similar, indicating that the modi- 
fication of the testing procedure, or sam- 
pling differences, are unlikely to be the rea- 
sons for this finding. The Expressed-Affection 
scale of the Firo-B also relates to perform- 
ance in the replication sample but not in 
the original, The College Vocabulary Test is 
unrelated to performance, 

One of the most striking findings of this 
experiment is that the correlations between 
personality variables and performance in the 
brainstorming-feedback groups are very dif- 
ferent from those in the brainstorming-non- 
feedback groups. Only one variable, Social 
Presence, is significantly related to perform- 
ance in the brainstorming feedback group. 
The only other variable with a sizable cor- 
relation coefficient in this and all other groups 
is sociability, 


Critical Problem Solving 


In the critical problemesolving nonfeed- 
back groups again Sociability is significantly 
related to performance, The Expressed-In- 
clusion, Ixpressed-Affection, and Wanted- 
Affection scales of the Firo-B also yield sig- 
nificant correlations, Factor I (Interpersonal 
Iiffectiveness) and Factor TV (Convention- 
ality) are nearly significant. In the critical 


problem-solving feedback groups, only Ca- 
pacity for Status and Sociability were related 
to performance. Again Factor I (Interper- 
sonal Effectiveness) and Factor IV (Conven- 
tionality) are nearly significant. No variable 
is significantly related to performance under 
all five group conditions, but Sociability and 
Factor I (Interpersonal Effectiveness) are 
nearly so, and the Firo-B Expressed Control 
scale is a close third. 


l’eedback 


If only the nonfeedback group problem- 
solving conditions are considered, Sociability, 
Iixpressed Inclusion, and Expressed Control 
are all significantly related to performance. 
Given larger samples there is no question 
that the Myers-Briggs Extroversion-Introver- 
sion scale and Factor I (Interpersonal Ef- 
fectiveness) would also have correlated with 
performance across all three samples. It 
should also be noted that the correlations 
with total number of ideas (not reported 
here) were similar to those reported here 
only somewhat lower in magnitude. 

No variable is significantly related to per- 
formance in both group feedback conditions, 
but Self-Control and Sociability are nearly 
SO, 

In order to test for homogeneity within 
the samples studied, means on all variables 
were compared between the three major pro- 
cedures (brainstorming, critical problem 
solving, and individual brainstorming) and 
between feedback and nonfeedback condi- 
tions within each procedure. A two-tailed t 
test yielded no significant differences on any 
variable at the .01 level and 7 significant 
differences out of 174 at the .05 level. There 
is no doubt the groups were homogeneous 
with respect to the variables studied. 


DISCUSSION 


The results of the analysis of the person- 
ality data are very clear. There is no strong 
and consistent relationship between person- 
ality as measured by the CPI, Firo-B, and 
Myers-Briggs type indicator and all three 
individual performance conditions in this ex- 
periment. There are low correlations across 
the three individual conditions for Sociabil- 


PERSONALITY, PROBLEM SOLVING, AND PERFORMANCE 20 


ity, Communality, and Intellectual Efficiency 
on the CPI and judging-perceiving on the 
type indicator. Feedback under the individual 
problem-solving procedure has no effect on 
overall effectiveness, however, it has a strong 
interaction with personality type;  specifi- 
cally, it depresses the performance of the 
type of person who is generally quite effec- 
tive either alone or in a group. This is in- 
dicated by the marked contrast in the cor- 
relations between the individual brainstorm- 
ing nonfeedback, and the individual brain- 
storming feedback groups for a number of 
scales (Cs, Sy, Sp, Wb, So, Gi, Fe). This 
finding is reinforced by a high negative cor- 
relation (— .44, — .42) between sociability 
and satisfaction with group and _ self-per- 
formance under the feedback conditions, as 
contrasted to the high positive correlations 
(+ .42, + .42) between the same variables 
under the nonfeedback condition. 

The correlations between personality vari- 
ables and performance in the various group 
conditions are more consistent than the cor- 
relations between personality variables and 
performance under various individual con- 
ditions. They are not as consistent from 
group to group as one might hope, and a 
variety of qualifications are necessary. The 
cross validation of the original brainstorming 
group in the brainstorming nonfeedback con- 
dition was very successful as Table 16 in- 
dicates. There is an interaction between per- 
sonality and feedback under both group 
problem-solving procedures, similar to that 
under the individual procedure. One reason 
for this effect may be that under feedback 
conditions the more active subjects react neg- 
atively to their own performance and their 
activity decreases. In the group situation 
this perhaps spurs other members to per- 
form, since in most groups a minimum level 
of activity is almost always maintained. 
Performance on the part of the active in- 
dividual does not cease altogether, because 
he realizes that the other active group mem- 
bers sounded no better than he did, and his 
status with respect to other group members 
does not change very much. In the individual 
situation, the active subject has no one with 
whom to compare himself, and his negative 


evaluation of his performance is likely to 
cause a greater deterioration in his perform- 
ance, thereby generating a much lower cor- 
relation coefficient. This interpretation is re- 
inforced by the very low ratings by individ- 
uals, of the effectiveness of the procedure 
(Table 8). The relative levels of the corre- 
lations for the critical feedback and non- 
feedback groups do not differ greatly. In line 
with the tentative explanation given above, 
we would argue that in the critical problem- 
solving nonfeedback condition all Ss are ac- 
tually getting feedback from each other as 
they go along. Therefore no differential ef- 
fect would be expected. There are no per- 
sonality variables that uniquely characterize 
the nonfeedback condition as opposed to the 
feedback condition. This indicates that it is 
unlikely that the feedback procedure stimu- 
lates any particular kind of person to per- 
form more effectively. French (1958) has 
presented evidence that task-feedback has 
a tendency to improve the performance of 
achievement-oriented persons. Those results 
did not generalize to this study where the 
achievement scales of the CPI were used as 
indicators of achievement-orientation. 

If we look at all the group problem- 
solving conditions we find only one scale, 
Sociability, and one factor, Interpersonal Ef- 
fectiveness, consistently (but not always sig- 
nificantly) related to performance. The 
Sociability scale is the defining scale of Fac- 
tor I and fittingly enough it was empirically 
developed to predict social participation 
(Gough, 1952). Thus there is no question 
that Interpersonal Effectiveness is a powerful 
predictor of problem-solving effectiveness in 
small groups. Gough’s personological charac- 
terization of the high scorer on Sociability 
is very succinct and worth quoting at this 
point. He reports that high scorers tend to 
be seen as “outgoing, enterprising, and in- 
genious, as being competitive and forward, 
and as original and fluent in thought” 
(Gough, 1957, p. 10). 

If we restrict ourselves to effective per- 
formance in just the group brainstorming 
nonfeedback conditions a larger number of 
individual variables come into play. Some of 
these variables differ from Sociability and 
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are similar to each other in that they re- 
flect a greater degree of dominance, control, 
aggressiveness, and self-seeking, (Dominance, 
Expressed-Control, Capacity for Status). The 
others reflect self-confidence, spontaneity, 
and expressiveness (Social Presence, Self- 
Acceptance). The personological implications 
of these added variables are quite clear. 
High scoring subjects in the brainstorming 
groups have well developed social skills, are 
outgoing, enterprising, original, verbally flu- 
ent, fluent in thought, somewhat aggressive, 
dominant, and controlling, and yet concerned 
with feelings of others. They possess self- 
assurance and are spontaneous, expressive, 
and enthusiastic. 

The failure to find any significant correla- 
tions between our adjustment factor (Factor 
II) and performance under any of the group 
conditions runs counter to Heslin’s (1964) 
general conclusion that adjustment is fairly 
consistently related to performance measures 
in small groups. This discrepancy may be 
accounted for by the fact that both our 
criterion and predictor variables differ con- 
siderably from those used in the studies he 
reviewed. Nevertheless our results decrease 
the generality of his conclusions consider- 
ably. 

In view of the magnitude of the relation- 
ships between personality variables and per- 
formance reported here, it seems reasonable 
to point out that the task was relatively neu- 
tral and required a minimal amount of inter- 
personal interaction. It is not at all unlikely 
that personality factors may be of even 
greater importance when the task has a high 
personal valence, and the participants are 
required to interact to a greater degree. 

The data reported here suggest a number 
of interesting possibilities for future research. 
The correlations between personality and 
performance under the various group prob- 
lem-solving procedures suggest that proper 
subject selection can increase performance 
in brainstorming groups more than in criti- 
cal problem-solving groups. Studies of homo- 
geneous and heterogeneous groups should 
focus primarily on measures of interpersonal 
effectiveness. Earlier, reference was made to 
the use of complex predictors to predict per- 


formance in groups. These data indicate that 
personality variables would definitely be of 
value in such an endeavor. The correlations 
between CPI measures and measures of in- 
tellectual functioning (Gough, 1957, p. 36) 
indicate, nevertheless, that the former are 
not entirely free of general ability. Our fail- 
ure to find an effect due to feedback should 
be viewed with caution since extended prac- 
tice might show a positive effect, particularly 
in the brainstorming groups. This possibility 
should be investigated. Unlike Taylor et al. 
(1958) we feel that extended practice and 
perhaps more specific instructions would 
benefit groups more than individuals. Some 
of the procedures suggested by Gordon 
(1961) would be likely candidates. We have 
also suggested that studies contrasting real 
groups versus nominal groups should sys- 
tematically vary group size. During construc- 
tion of the nominal groups, it was readily ap- 
parent to us that, in spite of having con- 
tributed a large number of ideas, the fourth 
individual was contributing little to the nomi- 
nal group. Thus there may be more rapidly 
diminishing returns in nominal groups than 
real groups. The fact that individuals can 
be selected for group work on the basis of 
personality, without necessarily selecting 
against individual ability and perhaps in 
favor of it, suggests another source of differ- 
ential improvement for group performance. 
Therefore, in spite of the negative findings 
concerning group performance reported in 
this study, there is little doubt that the 
question of group versus individual perform- 
ance needs more extensive investigation. 
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APPENDIX A 
Kffectiveness Scale 


O=no conceivable contribution to solution o 
problem, suggestion impossible of attainment 
| = very little, if any contribution to solution o 
problem; 2 = probably some contribution to sol 
ution of problem; 3 = definite minor contributior 
to solution of problem; 4 = clearly a majo 
contribution to solution of problem, 


Probability Scale 


0 = very highly improbable or clearly impos 
sible; 1 == conceivable, but improbable; 2 = pos 
sible; 3= probable; 4s highly probable, 


Practicality-importance Scale 


Qs impractical or unimportant; 1 not to 
practical or not too important; 2 = somewha 
practical or somewhat important; 3 =e practien 
or important; 4 = highly practical or very im 
portant, 


APPENDIX B 
(JUESTIONNAIRIG 


1, How much did you enjoy working with thi 
group of people? 


eo ee oe 
Dik Lc i 


I did not enjoy 
it at all 


I enjoyed | 
very mue 
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. How would you feel if I told you that your 
group did not do a good job? 

Pare, 3 4S 
a a a ee ae ee ee 


It would not bother 
me in the least 


I would be very 
disappointed 


- How would you feel if I told you that you 
did not do a good job? 


fete 3 4 S67 8 10 
eet | | tt 


It would not bother 
me in the least 


I would be very 
disappointed 


- How much did you enjoy working on these 
particular problems? 


fe? 3 4 56.7. 8 9 10 
|) A 


I did not enjoy 
them at all 


I enjoyed them 
very much 


. How satisfied were you with the group’s 
performance? 


__ 3 SS ee ee | 


Very dissatisfied Very satisfied 


. How satisfied were you with your own 
performance? 


eee tt 


Very dissatisfied Very satisfied 


i 


8. 


DK 


10, 


a 


How did listening to the tape recording 
of the group’s early work affect your later 
performance? 


UFR leet pop peeeeorOLa ? 8 O10 


eee eee eee eee || 


Hindered it Had no effect Helped it 


After listening to the tape how did you 
feel about suggesting further ideas? 


Olas  2ee eon 7.8.9) 410 
(ee ee a et | 


Felt less comfortable 
and was less willing 


Felt more comfortable 
and was more willing 


Do you feel that this is an effective way 
to solve problems? 


0 Cleese ek en O 8.25. 10 
ance el eee | 


Very effective 
procedure 


Very ineffective 
procedure 


How nervous did you feel during the group 
session? 


Eee 


Very nervous 


All groups exert some sort of critical 
atmosphere. In this group how strongly 
do you feel you were being judged or 
criticized by other group members? 


Opeliedgns ene 80,67 78-94 10 
Mee eect eee ee | | 


Did not feel I was 
being judged at all 





Not at all nervous 


Felt I was being judged 
a great deal 


(Received February 23, 1968) 
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INSTRUMENTALITY THEORY OF WORK MOTIVATION: 
SOME EXPERIMENTAL RESULTS AND SUGGESTED MODIFICATIONS * 


GEORGE GRAEN ? 


University of Illinois, Urbana 


Instrumentality theory hypothesises that a person’s attitude toward an 
occurrence (outcome) depends on his perceptions of how that outcome is 
related (instrumental) to the occurrence of other more or less preferred 
consequences. In this paper we propose an extension of this theory, describe 
the results of an experimental design to test deductions flowing from the 
extended model, and suggest how our results lead to further modifications 
in the theory. The Ss working in a simulated organization were assigned 
randomly to the following three treatments: (a) a condition where favorable 
feedback of high achievement was perceived to be contingent upon effective 
performance, (b) a condition where Ss received an outcome of money which 
was not contingent upon effective performance, and (c) a control condition 
where Ss received neither achievement feedback nor money. Results of this 
experiment, conducted in a realistic but carefully controlled work setting, 
show that instrumentality theory predictions of particular levels of job 
satisfaction and/or job performance are confirmed under only a few rather 


narrowly specified conditions. 


In his book Work and Motivation (1964), 
room reviewed much of the empirical evi- 
ence on work motivation within the context 
f a single theoretical model. His model of 
ork motivation is drawn from “instrumen- 
ality”’ conceptualizations presented earlier by 
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Peak (1955) and is in turn similar to theories 
presented by Rotter (1955), Atkinson (1958), 
and Tolman (1959). Instrumentality theory 
(Peak, 1955) hypothesizes that a person’s 
attitude toward an outcome (state of nature) 
depends on his perceptions of relationships 
(instrumentalities) between that outcome and 
the attainment of various other consequences 
toward which he feels differing degrees of 
liking or disliking (preferences). In essence, 
then, Peak’s theory hypothesizes that a per- 
son’s attitude toward something (say, racially 
integrated housing) increases monotonically 
with the algebraic sum of the products of his 
perceived instrumentalities between other 
consequences of integrated housing and his 
relative preferences for seeing such conse- 
quences come about. Some support for this 
type of linkage has been provided by Peak 
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(1955) and her colleagues (Carlson, 1956; 
Peak, 1960; Rosenberg, 1956). Vroom (1964) 
applied this general approach to the area of 
work motivation by postulating that the 
valence (preference) for attaining a first-level 
outcome—such as being in an occupation, on 
a particular job, or being told that his per- 
formance is effective—depends on what he 
perceives to be the other consequences (sec- 
ond-level outcomes) of that outcome and how 
attractive these other consequences are. This 
notion leads directly to statements about the 
relative level of satisfaction a person may feel 
toward various first-level circumstances (out- 
comes) such as certain occupations, jobs, or 
job accomplishments. 

In addition to instrumentality, Vroom in- 
troduces expectancy into his model of work 
motivation. Employing the concept in the 
same way as Rotter (1955), Vroom defines 
expectancy as a person’s perception (sub- 
jective probability) of how his actions may 
be related to the attainment of first-level 
outcomes. Thus, for Vroom, the “force” impel- 
ling a person to perform a particular job- 
related action depends on the person’s pref- 
erence (valence) for a first-level outcome and 
his subjective probability estimate (expect- 
ancy) that his action will result in attaining 
that outcome. The term “force” denotes the 
relative probability that the action will be 
emitted. Combining instrumentality with ex- 
pectancy, hypotheses related to goal or out- 
come-oriented job behaviors become pos- 
sible—behaviors such as occupational choice, 
taking or staying on a particular job, or 
exerting effort toward becoming effective on 
a particular job. 

With the above statement of Vroom’s model 
in mind, we now propose a modest extension 
of the model and describe an experiment de- 
signed to test certain deductions from the 
extended model with additional modifications 
of the model based on results obtained in the 
experiment. 


EXTENSION OF VROOM’s MOopDEL 


This extension of Vroom’s formulation 
draws on a number of additional concepts. 
First, role concepts are employed to help 
in defining differences between first-level and 
second-level outcomes and to help specify 


more fully the nature of relationships betwee 
various concepts. Second, the model is take 
out of the relatively constrained ahistoric: 
approach to a broader historical approac! 
Third, the model is modified so as not 1 
depend on field theory concepts, which a1 
believed often to contain undesirable su 
plus meaning. 

The present instrumentality theory of wor 
motivation views individual work behavioi 
(both attitudinal and instrumental) as ou 
puts of a work personality—work role systen 
This theory focuses on a single individual 
work personality in his work role, and a 
concepts refer to this basic unit. One con 
ponent of this system, an individual’s wor 
personality, is defined, in part, as a person 
preferences for various consequences of a 
taining work roles and his dispositions fc 
perceiving and evaluating various instrumer 
tality and expectancy relationships. The othe 
component of this system, an individual 
work role, is defined as a set of behaviors e: 
pected by the organization and considered ar 
propriate of an incumbent of a position withi 
the organization. Some examples of wor 
roles are an occupational group member, a 
incumbent of a particular job, an effectiy 
job performer, a leader, and a team membe' 
Work roles must be attained and maintaine 
through performing the expected behaviors i 
such a way that the resulting performanc 
meets the minimum standards of appropriat 
behavior. These standards of appropriate be 
havior imply an evaluation of the expecte 
behaviors by an external agent (e.g., a super 
visor) and, also, an organizational contingenc 
between meeting the criteria of appropriat 
behavior and the attainment or maintenance 
of that work role. Although one person ma‘ 
attain or maintain several different work role 
at one time, this formulation focuses on onl: 
one role. For each role, the major variable 
include the expected behaviors and criteria fo 
appropriate behavior and the contingency be 
tween behavior and the attainment or main 
tenance of that single work role. Moreovet 
associated with the attainment or maintenance 
of each work role are role outcomes. Rol 
outcomes are defined as particular outcome 
accruing to a person from the attainment o 
maintenance of work roles. Some examples 0 
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role outcomes are a feeling of achievement, 
recognition, responsibility, status, money, 
working conditions, interpersonal relations 
and occupational development. 


, 


Attraction of Work Roles 


The attraction of a work role for an in- 
dividual depends on the perceived attraction 
of various role outcomes and the perceived 
instrumentality of that work role for the at- 
tainment of these various role outcomes. The 
attraction of a role outcome for a person is 
defined as his preference for attaining that 
outcome. Attraction is assumed to vary from 
positive through zero to negative values. As 
with valence (Vroom, 1964), attraction differs 
from realized satisfaction: Attraction is viewed 
as the anticipated satisfaction with an out- 
come. Although attraction may be viewed by 
some as synonymous with valence, the present 
author feels that the concept of valence con- 
tains too much “surplus meaning” to be use- 
ful, because it seems inexorably embedded in 
the ahistorical field theory approach. 

The second variable determining the at- 
traction of a work role for a person is the 
perceived instrumentality relationship between 
the attainment of the work role and the at- 
tainment of the various role outcomes, In- 
strumentality is defined as the degree of be- 
lief that the attainment of a particular work 
role will be followed by the attainment of 
one or more role outcomes. Instrumentality 
is viewed as a perceived correlation between 
the attainment or nonattainment of a par- 
ticular work role and the attainment or non- 
attainment of a particular role outcome. In- 
strumentality relationships are viewed as 
varying from +1.00 through .00 to —1.00. 
An instrumentality of +-1.00 indicates a belief 
that the role outcome is certain after the at- 
tainment of the work role and impossible 
without it; an instrumentality of —1.00 in- 
dicates the reverse; and an instrumentality 
of .00 indicates a belief of no relationship be- 
tween attainment or nonattainment of the 
work role and of that role outcome. If the 
chances of receiving a role outcome are im- 
proved by the attainment of a work role, this 
work role has positive instrumentality for 
the attainment of that role outcome; the 
work role is viewed as helping to attain that 


outcome. In contrast, if the chances of re- 
ceiving a role outcome are decreased by the 
attainment of a work role, this work role has 
negative instrumentality for the attainment 
of that role outcome; the work role is seen as 
interfering with the attainment of that out- 
come. 

For all role outcomes, attraction and in- 
strumentality are assumed to combine in a 
multiplicative manner and summate to pro- 
duce overall attraction. In this way, both 
positive and negative attraction and positive 
and negative instrumentality are allowed to 
enter the equation. One basic assumption of 
this model is that people are attracted toward 
favorable outcomes and away from unfavor- 
able outcomes. This model also reflects the 
instrumentality assumption that instrumen- 
tality moderates the relationships between the 
attraction of a work role and the attraction of 
the various role outcomes. The multiplicative 
manner of combining attraction and instru- 
mentality reflects this latter assumption. It 
states that positive attraction toward a work 
role can be enhanced in two different ways: 
First, role outcomes with positive attraction 
may combine with positive instrumentality. 
Second, role outcomes with negative attrac- 
tion may combine with negative instrumen- 
tality. In contrast, positive attraction for a 
work role may be decreased in two other 
ways: First, role outcomes with positive at- 
traction may combine with negative instru- 
mentality. Second, role outcomes with nega- 
tive attraction may combine with positive 
instrumentality. Moreover, a role outcome 
can make no contribution to the attraction of 
a work role if either its attraction or its 
instrumentality is zero. Thus, if the role out- 
come had negative attraction (e.g., blame for 
poor work), the work role would be more 
attractive to the extent that it interfered with 
the attainment of this outcome (the degree 
of negative instrumentality). On the other 
hand, if the role outcome had positive at- 
traction (e.g., praise for good work), the work 
role would be more attractive to the extent 
that it enhanced the attainment of this out- 
come (the degree of positive instrumentality). 

The diagram in Figure 1 represents the ex- 
tended model of work role attraction. This 
diagram shows how work role attraction de- 
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Work Role 
Attraction 


Attraction of 
Role Outoomos_ 


(Satisfaction) ! (A) 
—Instrumontality 
= (axt) Instrumental 


(2) 


achievement 
recognition 
Work Role, 


> achievement 


recognition 
Work Role, 
nolary 


Fic. 1. Extended model of work role attraction. 


Person 


pends upon the attraction (preference) for 
various role outcomes and the perceived in- 
strumentality relationship between the work 
role and the various role outcomes. Moreover, 
a person’s work role satisfaction is predicted 
from his perceived work role attraction. 


Responsivity of Instrumentality 


A set of questions that Vroom (1964) does 
not consider are perceptual consequences of 
certain work experiences. For example, what 
are the consequences of attaining a work role 
and also certain role outcomes on instrumen- 
tality or attraction? If instrumentality theory 
is to possess practical utility, it must be 
known to what extent perceived instrumen- 
tality is influenced by actual work experi- 
ence as opposed to cognitive manipulations. 
An experience of attaining a role outcome 
while in a work role will increase the perceived 
instrumentality of that work role for attaining 
like role outcomes only to the extent that 
instrumentality is influenced by actual work 
experience. To test this, the following is hy- 
pothesized: If a role outcome is attained fol- 
lowing the attainment of a work role, higher 
perceived instrumentality of that work role 
for the attainment of like role outcomes will 
result. 


Probability of Work Behaviors 


The probability that a person will perform 
an act to attain or maintain a work role de- 
pends upon the attraction of that work role 
and the perceived expectancy that performing 
the act will lead to the attainment or main- 
tenance of that work role. The attraction 
of the work role was discussed in the last 


section. Now the concept of expectancy must 
be considered. Perceived expectancy is de- 
fined as the degree of belief that an act will 
lead to the attainment of a work role. Ex- 
pectancy is viewed as the subjective prob- 
ability that performing an act will lead to the 
attainment of a work role. Expectancies are 
assumed to vary from .00 to 1.00. An ex- 
pectancy of 1.00 indicates a belief of certainty 
that performing the act will lead to the attain- 
ment of the work role; an expectancy of .00 
indicates a belief of certainty that performing 
an act will not lead to the attainment of the 
work role; and an expectancy of .50 indicates 
a belief of complete uncertainty. Although 
expectancy was described in terms of the at- 
tainment of a work role, this same descrip- 
tion applies also to the maintenance of a 
work role. 

The attraction of the work role and the 
expectancy are assumed to combine in a 
multiplicative manner to produce the prob- 
ability of the act being performed. In this 
way, expectancies are allowed to moderate 
the relationship between the attraction of the 
work role and the probability of the act. One 
basic assumption of this model is that be- 
havior is directed toward the attainment of 
favorable outcomes and away from unfavor- 
able outcomes. A second basic assumption of 
this model is that the probability of an act 
depends not only on the attraction of the 
goal, but also on the relative odds related to 
striving for the goal. The multiplicative man- 
ner of combining the attractiveness of a work 
role and expectancy reflects this latter as- 
sumption. It states that the probability of 
an act can be enhanced by a high expectancy 
and decreased by a low expectancy. This 
postulate thus incorporates the common sense 
notion that only a Don Quixote would reach 
for an unreachable star. 

The diagram in Figure 2 represents the 
extended model of the probability of an act 
being emitted. This diagram shows how work 
role attraction and expectancy combine to 
predict the probability of the act. 

This extended model differs from Vroom’s 
original model in two major respects. First, 
this model distinguishes clearly between work 
roles and role outcomes, whereas Vroom’s 
model at best implies the natures of different 


INSTRUMENTALITY THEORY OF WorkK MOTIVATION 5 


Attraction of 


Work Role 
Expectancy 
Act. Attainment of So (xt) 
Probability Effective Performance 
’ 
of the Act Superior 
= (AxI)E Bffort -B 
Nonattainment of ted 
Effective Performance = (axl) 
Person 
Attainment of 
Standard Performance = (ax1) 
Standard - 
Effort “E 
Nonattainment of 
Standard Performance = (axl) 


Fic, 2. Extended model of the probability of an act. 


kinds of outcomes. In Vroom’s original form- 
ulation, it often is difficult to determine if his 
outcomes are more like our work roles than 
our role outcomes or vice versa. Unless this 
distinction between kinds of outcomes is made 
explicitly, predictions from instrumentality 
theory rapidly degenerate into sterile com- 
plexity. If no sharp distinction is made be- 
tween kinds of outcomes, and thus instru- 
mentality relationships hold between each 
pair of possible outcomes, the prediction of 
job satisfaction would be a function of (a) 
the products of attraction and instrument- 
alities of the job situation for its “immediate” 
consequences and (0) the products of at- 
traction and instrumentalities of these ‘“im- 
mediate” consequences for “less immediate” 
consequences. 

A second major difference between the two 
models is the clear developmental implications 
of the extended model. The emphasis of the 
extended model on historical as opposed to 
ahistorical elements allows for better develop- 
mental studies of the formulation of work 
personality and work motivation. This orienta- 
tion toward understanding the developmental 
processes of work motivation is the major 
advantage of the extended model. 


EXPERIMENTAL SIMULATION 


This developmental model of instrumen- 
tality theory hypothesizes causal links be- 
tween its major variables. These causal 
hypotheses can be tested only in research de- 


signs that contain experimental controls ade- 
quate to produce information on the direction 
of influence between variables. Although in- 
creasing the amount of experimental control 
usually results in increased knowledge as to 
the direction of influence between variables, 
it also usually results in a decrease in the 
scope of the information produced. In terms 
of research designs, the correlational or field 
survey design tends to maximize the scope 
(complexity) of the information produced and 
minimize the depth (knowledge of direction- 
ality) of this information. At the other ex- 
treme, laboratory experiments tend to maxi- 
mize the depth and minimize the scope of the 
information produced. In contrast to these 
two extremes, the design used in this study 
represents an attempt to enhance both the 
scope and the depth. This type of research 
design should produce a kind of information 
not producible by either of the extreme 
designs. 

The design used in this study, called an 
experimental simulation, attempts to create 
a situation that contains most of the theo- 
retically important elements of the “real” 
situation that it is designed to simulate, and 
at the same time, attempts to maintain as 
much experimental control as possible. This 
study was designed to simulate a work organ- 
ization operating under different organiza- 
tional climates. Special procedures used to 
create the simulation included: (a) perform- 
ing the experiment in a business setting with 
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many of the usual props (e.g., personnel 
manager, selection testing, company offices, 
etc.), (0) providing an organizational purpose 
for the work, (c) hiring female applicants 
from the local labor market through a stand- 
ard recruitment and selection process, (d) 
using realistic tasks, (e€) keeping employees 
unaware of the fact of the experiment until 
it had been completed, and (f) embedding 
the treatment manipulations within an organ- 
izational procedure. In contrast to the usual 
field experiment, in this study experimental 
control rested fully with the investigator. 

The organizational climates that the treat- 
ment manipulations were designed to create 
were (ad) a reciprocating climate, (b) a 
prompting climate, and (c) a control climate. 
The reciprocating climate is defined as a situa- 
tion in which the attainment of role outcomes 
is viewed as contingent upon effective per- 
formance. This implies that the organization 
maintains the practice of reciprocating the 
effective performance of its members in terms 
of favorable role outcomes. The prompting 
climate is a situation in which the attain- 
ment of role outcomes is seen as an induce- 
ment to effective performance and is not seen 
as contingent upon effective performance. In 
the prompting climate, the organization main- 
tains the practice of using role outcomes to 
stimulate its members toward effective per- 
formance without establishing contingencies 
between effective performance and the sub- 
sequent attainment of role outcomes. Finally, 
the control climate is the situation in which 
the attainment of role outcomes is viewed 
neither as being contingent upon effective per- 
formance nor as an inducement to effective 
performance. In the control climate, the 
organization does not employ role outcomes as 
motivating conditions. All three of these cli- 
mates exist within modern organizations. 

These three organizational climates were 
created within the same organization through 
differential treatment of Ss. For one group 
of Ss, the organization followed the policy of 
the reciprocating climate (reward contingent 
upon effective performance). For a second 
group, the organization acted in accordance 
with the policy of the prompting climate (re- 
ward as inducement to effective performance). 
For a third group, the organization practiced 


the policy of the control climate (reward 
neither contingent upon effective performance 
nor as inducement to effective performance). 

The operational manipulations employed to 
create the organizational climates of recip- 
rocating, prompting, and control were called 
“achievement feedback,” “money,” and ‘‘con- 
trol” conditions respectively. Achievement 
feedback is defined as information from a 
superior indicating effective performance 
(outstanding performance) on a previous task. 
Money is defined as information from a 
superior indicating an increase in pay di- 
rected toward improved performance but not 
contingent upon previous performance. Fi- 
nally, control is defined as information from a 
superior indicating neither effective perform- 
ance nor a raise in pay directed toward im- 
proved performance. Thus, the manipulations 
differ in terms of the contingencies employed 
and in terms of the role outcomes used. The 
nature of the organizational climates of in- 
terest require that different role outcomes be 
employed. The reciprocating climate stipulates 
that rewards must be contingent upon effec- 
tive performance. As a consequence of this 
stipulation, whatever favorable role outcome 
is placed in the contingency necessarily will 
also imply recognition for good work and 
possibly a sense of achievement. In contrast, 
the prompting climate requires that rewards 
must not be contingent upon effective per- 
formance but merely directed toward it. As a 
consequence of this requirement, the role 
outcomes of recognition for good work and a 
sense of achievement cannot be employed to 
create the prompting climate. 

Within each of the three organizational 
climates, the study focused on the two work 
roles of job incumbent and effective per- 
former. Corresponding to these two roles were 
two prediction models: the job incumbent 
model and the effective performer model. 
These two models differ in terms of the kind 
and number of input variables used to gen- 
erate predictions and in terms of the variable 
predicted. The input variables of the job in- 
cumbent model are the products of the per- 
ceived attraction of each role outcome and the 
instrumentality of the role of job incumbent 
for the attainment of like role outcomes— 
two-term products that are summed alge- 
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oraically over all role outcomes. This job 
incumbent model predicts overall job satis- 
faction. In contrast, the input variables of 
the effective performer model are the products 
of the perceived attraction of role outcomes 
and the instrumentality of the role of effect- 
tive performer for the attainment of like role 
outcomes that are summed over all role out- 
comes and then multiplied by the expectancy 
that increased effort will lead to more effec- 
tive performance—three-team products. This 
effective performer model predicts job per- 
formance; however, this model also should 
predict satisfaction to the extent that the 
effective performance contingencies are im- 
portant aspects of the overall role. 

The specific hypotheses of this study are 
stated below. The first and third hypotheses 
are the predicted consequences of job experi- 
ences on instrumentality relationships. The 
second and fourth hypotheses are the predic- 
tions of the job incumbent and the effective 
performer models, respectively. 

Hypothesis I. If a role outcome is attained 
following the attainment of the role of job 
incumbent, higher perceived instrumentality 
of that role for the attainment of like out- 
comes will result. 

Hypothesis Il. Satisfaction with the role of 
job incumbent is a monotonically increasing 
function of the products of the attraction of 
each role outcome and the perceived instru- 
mentality of that work role for the attainment 
of like role outcomes summed over all role 
outcomes (job incumbent model). 

Hypothesis III. If a role outcome is at- 
tained following the attainment of the role of 
effective performer, higher perceived instru- 
mentality of that role for the attainment of 
like outcomes will result. 

Hypothesis IV. Job performance is a mono- 
tonically increasing function of the product 
of the attraction of the role of effective per- 
former and the perceived expectancy that in- 
creased effort will lead to more effective per- 
formance (effective performer model). 


DESIGN OF THE EXPERIMENT 


Subjects 

The Ss in this study were 169 women selected 
from 203 applicants from the local labor market. 
Each woman hired was told that the job was a 


part-time, temporary position with Decision Sys- 
tems, Inc., a fictitious company. Most of the Ss 
were between 15 and 18 yr. old, students in high 
school, and single. Although the ages ranged from 
15 to 66 yr., the sample was most representative 
of the 15-18 yr. age group. 


Instruments 


Special instruments developed to measure key 
parameters from instrumentality theory included: 
(a) measures of the perceived attraction of various 
role outcomes (role outcome preferences), (b) 
measures of the perceived instrumentality of the 
work roles of a particular job and of effective job 
performer for the attainment of eight role outcomes, 
and (c) a measure of the perceived expectancy that 
increased effort would lead to more effective job 
performance. 


Attraction 


Attraction of various role outcomes were mea- 
sured by an importance questionnaire containing 
26 statements about various work role outcomes. 
Of the 26 statements, 5 each were written for the 
outcomes of achievement, salary, human relations, 
and recognition, and 1 each was written for the 
outcomes of work itself, policies and practices, tech- 
nical supervision, responsibility, working conditions, 
and promotion. Instructions asked S to rate each 
outcome on its importance in a permanent job for 
him. The response alternatives were “an unnecessary 
part of the job,” “an almost unnecessary part of the 
job,” “an important part of the job,” “an almost 
essential part of the job,” and “an essential part of 
the job.” Before rating the outcomes, Ss were in- 
structed to read carefully all 26 items to enable 
them to evaluate the importance of each statement 
relative to the other 25 statements. This attraction 
instrument was pretested and standardized on three 
different samples: 64 male college sophomores, 77 
female college sophomores, and 629 female office 
workers in one company. The two groups of students 
were administered the importance questionnaire on 
two occasions separated by 2 wk. Female office 
workers were given the questionnaire only once, 
during working hours at their place of employment. 


Instrumentality Measures 


Two different sets of instrumentality measures 
were developed: One set was designed to measure 
the perceived instrumentality of the work role of a 
particular job for the attainment of eight different 
role outcomes, and the other was designed to 
measure the instrumentality of the work role of an 
effective job performer for the attainment of the 
same eight role outcomes. These eight role out- 
comes were accomplishment, achievement feedback, 
recognition, responsibility, human relations, company 
policies and practices, salary, and working conditions. 
Instructions for the first set of measures asked S$ 
to indicate, for each outcome, what he felt were 
his chances of receiving the outcome on his present 
job as compared to his previous jobs. The response 
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alternatives for each outcome were “much worse,” 
“worse,” “same,” “better,” and “much better.” These 
response alternatives were scored 1-5, respectively. 
Instructions to the second set of measures asked S$ 
to indicate, for each outcome, what he felt were his 
chances of receiving the outcome as a result of his 
effective job performance. The response alternatives 
were “none,” “slight,” “fair,” “good,” and “excellent.” 
These response alternatives also were scored 1-5, 
respectively. 


Expectancy Measure 


The particular expectancy relevant to this study 
was that increased work effort would result in more 
effective job performance. Instructions for expectancy 
asked S to indicate what he felt were his chances 
of improving his performance if he “really worked 
hard.” The response alternatives were “No chance at 
all,” “Probably would not improve,” “Do not know,” 
“Probably would improve,” and “Certain to im- 
prove.” These alternatives were scored 1-5, respec- 
tively. 


Treatment Effectiveness Measures 


Special measures also were developed to assess 
the degree to which the treatment manipulation 
created the intended effects on Ss. Two different 
sets of instruments were developed for this purpose. 
One set of instruments was contained in a question- 
naire designed to measure satisfaction with eight 
different outcomes of the job. These outcomes were 
the same ones used in the instrumentality measures, 
namely, accomplishment, achievement feedback, rec- 
ognition, responsibility, human relations, policies and 
practices, salary, and working conditions. In its final 
version, the satisfaction questionnaire contained 
scales for the above eight role outcomes and also 
scales for six outcomes serving as distractors. The 
purpose of the distractor outcomes was to dis- 
guise the outcomes of interest. Each of these 14 
scales contained three items. Instructions asked S 
to indicate how satisfied he felt with each outcome 
of the job described by the item. The response 
alternatives were “not satisfied,” “only slightly satis- 
fied,” “satisfied,” “very satisfied,’ and “extremely 
satisfied.” These response alternatives were scored 
1-5, respectively. The effectiveness of the treatment 
manipulations could be checked with these 8 satis- 
faction scales in the following manner. If achieve- 
ment feedback had its intended effects, the group 
receiving this treatment should demonstrate greater 
satisfaction than the group receiving the control con- 
dition with only the outcomes of accomplishment, 
achievement feedback, and recognition. In contrast, 
if money had its intended effect, the group receiving 
this treatment should show greater satisfaction than 
the control group with only the outcome of salary. 
Moreover, other differences appearing on these scales 
would need to be explained. 

The second measure of treatment effectiveness was 
the level of performance Ss felt they attained on 
those tasks completed after receiving the treatment 


outcomes. Instructions asked S to indicate how he 
thought he performed compared to his co-workers. 
Response alternatives were “poorer,” “average,” 
“above average,’ and “among the best.” These re- 
sponse alternatives were scored 1-4, respectively. On 
these measures, if achievement feedback had its in- 
tended effect, Ss receiving this treatment should 
indicate a higher level of attained performance than 
Ss receiving the control condition. 


Job Satisfaction Measure 


Overall job satisfaction was measured by the 
Hoppock Job Satisfaction Blank (Hoppock, 1935). 
This instrument contained four items, each with 
seven response alternatives. Although all four items 
were asked, only three were scored in the measure of 
overall job satisfaction. The item dropped asked Ss 
to check the statement best describing how they 
felt about changing jobs. This item was ambiguous 
considering the temporary nature of the job. 


Work Tasks 


The experimental tasks in this study were chosen 
so that objective performance data could be gathered, 
so that most Ss could learn and master them in a 
short time, to be sensitive to changes in the ex- 
penditure of effort, and to provide what appeared to 
be “real” work for the “employees.” Two tasks 
meeting all of these requirements are outlined below. 

For both work tasks, Ss were given a booklet 
containing 10 pages of computer output for a 155 
variable correlation matrix. The output, showing 
the lower triangle of the matrix, presented one 
variable at a time with 10 coefficients per row. The 
coefficients were six-decimal numbers. This booklet 
was used for two different work tasks. The first task, 
called the search task, required Ss to find certain 
specified numbers and to write that six-decimal 
number on an answer sheet. The second task, called 
the rounding task, required Ss to find the specified 
numbers in the same manner as on the search task, 
but once the proper number was found, it was to 
be rounded from six to two decimal places, ac- 
cording to special rules. Only the first two digits of 
the rounded number were to be written on the 
answer sheet. Rules for rounding were that Ss look 
at the number in the third place. If the number was 
5 or greater, they were to round up one number. 
If the number was 3 or less, they were to make 
no change. If the number was 4, Ss were to look at 
the number in the fourth decimal place and apply 
the rules again. The Ss were to continue this process 
until they either arrived at a number not 4 or 
exhaused all six decimal places. In the latter case, 
they were to write down the full number. On both 
work tasks, Ss were to complete each item in the 
order presented. Performance on each task was 
measured in two different ways: First, quantity of 
performance was measured by the number of items 
attempted. Second, quality of performance was 
measured by the proportion of items correct of those 
attempted. 
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Experimental Instructions 


The job was presented as an entirely new kind 
of computer-related activity. The Ss were told that 
the usual selection measures did not predict per- 
“formance on this job. Moreover, Ss were informed 
that their performance on previous jobs would be 
a poor indicator of their performance on this job. 
This rationale was given to justify the blanks that 
Ss were to complete. In addition, Ss were told that 
the purpose of this temporary job was to collect 
cost information to be used in bidding for contract 
work. The importance of obtaining precise cost 
estimates for this kind of work was emphasized by 
outlining the consequences of submitting a bid based 
upon faulty cost information. The Ss were in- 
formed that precise cost information on this work 
could be gathered only if they worked at a pace 
that they usually would maintain 8 hr. a day, 
every working day. This was done in an effort to 
set the initial amount of effort at a realistic level. 
Results of a pilot study had shown that Ss main- 
tained unusually high levels of performance when 
pacing themselves for relatively short work periods. 
These instructions were designed to increase the pac- 
ing period to at least 8 hr. The Ss were told 
that should the company win the contract, contract 
work would be performed at a distant city. This 
was done so that Ss would not expect future employ- 
ment opportunities with Decision Systems. 


Treatments 


The achievement feedback, money, and control 
treatments were presented to Ss individually in a 
letter from the personnel manager of Decision Sys- 
tems. Each letter was addressed personally to S and 
was two pages in length. The first page was the 
same for all letters, and the second page contained 
the treatment information. Most of the letter was 
mimeographed with only the underlined portion 
shown below being typed. The crucial portion of 
each treatment letter is shown below. The achieve- 
ment feedback information was as follows: 


According to the results of the work samples 
which you completed the other day, your per- 
formance was AMONG THE BEST of all those 
who took the samples. You really did well on 
this kind of work. If you continue to do as well 
today, we will let you know. We are extremely 
happy with your outstanding performance. 


The money information was as follows: 
According to the results of the work samples 
which you completed the other day, your per- 
formance was ABOUT AT THE AVERAGE of all 
those who took the samples. As you know the 
usual rate of pay for this kind of work is $1.50 
per hour. However, in the hope you will do much 
better than that today, we will pay you at the rate 
of $1.75 per hour. 

The control information was as follows: 
According to the results of the work samples 
which you completed the other day, your per- 
formance was ABOUT AT THE AVERAGE of 


all those who took the samples. As you know the 
quoted wage for this kind of work is $1.50 per 
hour. 


CONDUCT OF THE STUDY 


The location for the study was a large conference 
room of a downtown hotel. This room usually was 
used for conferences and training courses by local 
business groups. This location was chosen to en- 
hance the simulation of an actual business endeavor. 

Data collection was divided into two sessions 
separated by one day. In the first session, applicants 
were given a personal history blank and the im- 
portance questionnaire, told about the company and 
the purpose of the work, trained on the tasks, and 
tested on work samples. Based on this information, 
Ss were chosen from the applicant pool and assigned 
to homogeneous ability and outcome preference 
groups. Within each group, Ss were assigned 
randomly to the following treatment conditions: (a) 
achievement feedback, (6) money, and (c) control. 
In the second session, Ss were given two pretreat- 
ment tasks, presented with the treatments, and 
administered posttreatment measures. Finally, Ss 
were debriefed and paid. The posttreatment measures 
included: (a) quantity and quality of performance 
on four tasks, (6) instrumentality and expectancy 
measures, (c) the Hoppock measure of overall job 
satisfaction, and (d) measures designed to assess 
the effectiveness of the treatment manipulations. 
These data were analyzed using analysis of variance 
and correlation analysis. 

The study was conducted by £, who was kept 
entirely unaware of the specific hypotheses under 
investigation. Moreover, E was not informed of the 
treatment that any individual S received. This was 
done to minimize any effects due to E bias (Rosen- 
thal, 1964). 


First Session 


After the applicants completed the application 
blank and the importance questionnaire (attraction 
measure), £ read an orientation speech. This orienta- 
tion speech told the applicants about Decision Sys- 
tems, Inc., and the particular job. This speech thus 
contained the overall experimental instructions or set 
for the entire study. After this experimental set was 
established, applicants were trained on the search 
and rounding tasks. Training procedures emphasized 
a single method of doing the tasks in order to 
minimize the variance due to different work methods. 
The applicants who could not learn the tasks during 
training were rejected from participation in the 
second session of the study. After applicants were 
trained, they were given a search task and a round- 
ing task to complete as work samples. Upon com- 
pletion of the work samples, applicants were allowed 
to go. 

Based on the information obtained in the first 
session, Ss were chosen from the applicant pool and 
assigned to homogeneous ability and outcome pre- 
ference groups. This assignment was performed in 
the following manner. Scores of applicants on the 
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attraction of achievement outcomes as opposed to 
salary outcomes were divided at the median into 
prefer achievement and prefer salary groups. Ability, 
as measured by the number of items correct on the 
two work samples, also was divided at the median 
into high ability and low ability groups. Each S was 
classified into one of four groups: (a) high ability— 
prefer achievement, (b) high ability—-prefer salary, 
(c) low ability-prefer achievement, and (d) low 
ability—-prefer salary. Within each of these groups, 
Ss were assigned randomly to one of the three treat- 
ment conditions: achievement feedback, money, and 
control. Finally, within each of these 12 groups, Ss 
were assigned randomly to second sessions. About 
equal numbers of Ss from each of the 12 groups were 
assigned to each second session. Sampling procedures 
were successful to a gratifying extent (see Graen, 
1967). 


Second Session 


When Ss arrived at the second session, they were 
given assigned work places. The E then read a 
second orientation speech designed to re-establish the 
experimental set. Special reference was made to the 
work samples that Ss had completed in the first 
session, so that when the treatment letters men- 
tioned work samples, Ss might make the association 
readily. Next, training instructions were reviewed 
and a search and rounding task completed. After 
these two pretreatment tasks were finished Ss were 
asked to indicate their feelings toward the job. 
Next, they were given a 10-min. break. During the 
break, all Ss left the work room, and the letters 
containing the treatments were distributed. After 
the break, Ss were told that they had received 
letters from the personnel manager of Decision Sys- 
tems. They were told to read the letters carefully. 
When Ss had finished, the letters were collected. 
Immediately after this, Ss were given four 15-min. 
tasks: two search and two rounding. Immediately 
after the last task was completed, Ss were given a 
questionnaire containing the instrumentality, ex- 
pectancy, overall job satisfaction, treatment effec- 
tiveness, and debriefing measures. After this second 
session, Ss were debriefed and paid. 

A distinctive feature of this design was that it 
simulated many important features of the usual 
employment setting without the loss of experimental 
control often associated with the conduct of a field 
experiment (see Weick, 1967). Moreover, treatment 
and task procedures were refined or modified based 
on the results of rather extensive pretesting of the 
instruments and experimental procedures. 


Analysis 


The sampling design for this study consisted of 
12 groups: 3 treatment groups (achievement feed- 
back, money, and control) each containing 4 
homogeneous ability and outcome preference groups. 
Thus, ability and outcome preferences were equated 
for the 3 treatment groups. In the analysis, the 
treatment groups are viewed as three samples from 
different organizational climates and are analyzed 


separately. The analysis included the three-way 
analysis of variance appropriate to this sampling 
design and correlational analysis (Winer, 1962). The 
criterion measures of overall job satisfaction and 
quality and quantity of performance were analyzed 
both as raw (static) variables and as gain (dynamic) 
measures. Residual gain scores (Harris, 1963) were 
employed as measures of the gain from before to 
after the treatment administration. Residual gain 
scores are criterion scores with pretreatment scores 
partialled out (by subtracting from the posttreat- 
ment score the score predicted from knowledge of 
the pretreatment score). Thus, the raw variables are 
after-only measures, and the residual gain variables 
are before—after measures. The pretreatment scores 
used in calculating residual gain scores were those 
collected during the second session, immediately prior 
to the administration of the treatments. 


RESULTS 
Procedural Checks 


One set of measures designed to assess the 
treatment effectiveness of the role outcome 
manipulations was the set of outcome satis- 
faction scales. On these measures, if achieve- 
ment feedback created its intended effects on 
Ss, the group of Ss receiving it should show 
more satisfaction than the group receiving the 
control outcome with the amount of ac- 
complishment, achievement feedback, and 
recognition available on the job. As shown in 
Table 1, the achievement feedback group was 
more satisfied than the control group with 
achievement feedback and recognition but not 
with accomplishment. The nature of the tasks 
probably accounts for the achievement group 
not demonstrating higher satisfaction with ac- 
complishment. On the criterion tasks, no S$ 
completed even half of the available items, 
and during debriefing, several Ss stated a 
desire to complete the remaining items on the 
tasks. Apparently, the uncompleted tasks sup- 
pressed satisfaction with accomplishment. 
Turning to the money outcome, if the money 
treatment had its intended effects, the group 
receiving it should be more satisfied than the 
control group with salary, but not with ac- 
complishment, achievement feedback, or rec- 
ognition. According to Table 1, the money 
group was more satisfied than the control 
group on a single scale—salary. In addition 
to those effects intended for the treatments, 
no other significant difference was shown on 
the satisfaction scales. 
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TABLE 1 
MEANS ON THE SATISFACTION SCALES 








Treatment group 
Satisfaction with 
outcomes Achieve- | Money | Control 
ment 
(N = 56) | (VW = 57) } WW = 56) 

Accomplishment 9.61 9.51 9.04 
Achievement 

feedback 10.43* 9.51 9.46 
Recognition 10.05** 9.32 9.20 
Responsibility 9.79 10.07 9.55 
Human relations 9.82 9.32 9.43 
Policies & practices 9.63 9.86 9.61 
Salary 11.07 12.02** 10.52 
Working conditions 11.75 11.42 11.02 

* 
> <1: 


An additional procedural check was the 
measure of the level of performance Ss felt 
they had attained on the criterion tasks. If the 
achievement feedback treatment had its in- 
tended effect, Ss receiving it should indicate 
higher perceived performance than the control 
group. The money group should not be dif- 
ferent from the control group. As shown in 
Table 2, the perceived level of performance 
was higher for the achievement group than 
the control on both the search and the round- 
ing tasks. The money and control groups did 
not differ. Finally, debriefing results also sup- 
ported the effectiveness of the treatments. 


Hypothesis I 


If a role outcome is attained following the 
attainment of the role of job incumbent, 
higher perceived instrumentality of that role 
for the attainment of like outcomes will result. 

For this hypothesis to be supported in this 


TABLE 2 ; 


MEANS ON PERCEIVED LEVEL OF PERFORMANCE 
ATTAINED ON THE CRITERION TASKS 








Treatment group 





Work task 
Achievement | Money Control 
(NV = 56) (N = 57) | (VN = S56) 
Search 2.49* 2.30 2.05 
Rounding 2.60* 2.33 2.16 





*p <.01. 


TABLE 3 


MEANS ON PERCEIVED INSTRUMENTALITY OF THE WORK 
ROLE OF JOB INCUMBENT FOR THE ATTAINMENT 
OF SELECTED ROLE OUTCOMES 





Treatment group 








Instrumentality 
for outcomes Achieve- | Money | Control 
ment 
(NV = 56) | (VW = 57) | (V = 56) 

Accomplishment 3.39 3.40 2.93 
Achievement 

feedback Sore 3.25 3.074 
Recognition 3.68** 3.04 3.05 
Responsibility 3.34 3.42 S20 
Human relations 3.34** 3.02 2.96 
Policies & practices | 3.48 3.40 S32 
Salary 3.95 4.19* 3.80 
Working conditions 3.86 3.75 3.52 

*p < .05. 

KD < 01. 


study, the achievement feedback group must 
show higher perceived instrumentality for 
achievement feedback and recognition out- 
comes and the money group must show higher 
instrumentality for salary outcomes. Results 
of these hypotheses are shown in Table 3. 
According to Table 3, the achievement feed- 
back group showed higher instrumentality 
than the control group on achievement feed- 
back, recognition, and human relations. Hu- 
man relations fits into the achievement feed- 
back, recognition cluster, being concerned with 
attention to the individual. Moreover, the 
money group was higher than the control only 
on salary. These data clearly confirm the 
hypothesis that the consequence of receiving 
an outcome following the attainment of the 
work role of a particular job increases the 
perceived instrumentality of that work role 
for the attainment of like outcomes. These 
data show that instrumentalities can be re- 
sponsive to actual experience rather than 
being independent of the external environ- 
ment. At least when referring to the work 
role of a particular job, instrumentalities can 
be enhanced by the actual contingencies be- 
tween being on the job and receiving various 
outcomes. 


Hypothesis II 


Satisfaction with the work role of job in- 
cumbent is a monotonically increasing func- 
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tion of the products of the attraction of each 
role outcome and the perceived instrumentality 
of that work role for the attainment of like 
role outcomes summed over all role outcomes. 

After-Only—Job Incumbent Model. Prod- 
uct-moment correlations were calculated be- 
tween components of the job incumbent model 
and Hoppock overall job satisfaction to test 
Hypothesis IT in terms of its main predictions 
and intermediate linkages. This analysis was 
performed separately for each of the three 
treatment groups to enable comparisons 
among the treatments. Results of this analysis 
are shown in Table 4. In this table, the in- 
dependent variables of the job incumbent 
model include the products of the perceived 
attraction and the perceived instrumentality 
of the role of job incumbent for attaining: 
(a) each intrinsic outcome (accomplishment, 
achievement feedback, recognition, and re- 
sponsibility), (b) each extrinsic outcome 
(human relations, policies and practices, 
salary, and working conditions), (c) all in- 
trinsic outcomes, (d) all extrinsic outcomes, 
and (e) all outcomes (the complete job in- 
cumbent model). Finally, the last independent 
variable is the sum of only the instrumen- 
talities without attractions. 

According to Table 4, the correlations be- 
tween the complete job incumbent model (the 
sum of all outcomes) and Hoppock satisfac- 
tion supported Hypothesis II for the achieve- 
ment feedback and the control group, but not 
for the money group. These correlations were 
37, .03, and .35 for the achievement, money, 
and control groups, respectively. Considering 
the intrinsic and extrinsic categorization of 
the role outcomes, the intrinsic class signifi- 
cantly contributed to satisfaction for the 
achievement and control groups only; the 
extrinsic class failed to contribute to any of 
the treatment groups. This difference in 
strength of relationship for intrinsic and ex- 
trinsic classes on satisfaction supports the 
conclusions from a number of studies (Ewen, 
Smith, Hulin, & Locke, 1966; Graen, 1966; 
Graen, in press; Graen & Hulin, in press; 
Wernimont, 1966) that intrinsic variables are 
more potent than extrinsic variables in their 
contributions to overall job satisfaction. 
Turning to the individual role outcomes, only 
the intrinsic outcomes were related signifi- 


TABLE 4 


CORRELATIONS BETWEEN COMPONENTS OF THE JOB 
INCUMBENT MODEL AND OVERALL JoB 
SATISFACTION (HOppPock) 


Treatment group 





Independent 
variable Achieve- | Money | Control 
ment 
(N = 56) | (V = 57) | (V = 56) 
Attraction X Instru- 

mentality for: 
Accomplishment ze 04 42% 
Achievement 

feedback 20 —.11 .26* 
Recognition oe — 18 LS 
Responsibility aliZ 08 aod 
Human relations — .09 —.05 .10 
Policies & practices 09 —.10 .08 
Salary .20 14 12 
Working 

conditions 16 —.01 21 

Sum of: 

Intrinsic outcomes GOr™ — .06 Ore 
Extrinsic outcomes 14 00 19 
All outcomes 7% .03 orn 
Instrumentalities 

only 025% 24% oon 
* 
> S01 


cantly to satisfaction: accomplishment for the 
achievement and control groups, achievement 
feedback for the control group, and recogni- 
tion for the achievement group. It should be 
noted that the most positive observed cor- 
relation for the money group was on the out- 
come of salary. Finally, the sum of the in- 
strumentalities only had significant correla- 
tions with satisfaction for all three treatment 
groups. The magnitude of these correlations 
were .62, .24, .39 for the achievement, money, 
and control groups, respectively. As compared 
to those from the full model, these correlations 
lend support to the conclusions of Rosenberg 


(1956) that instrumentalities may be more © 


potent contributions to satisfaction than at- 
tractions. It should be emphasized that our 
money manipulation rendered the group re- 
ceiving this treatment essentially unpredict- 
able. 

A fter-Only—Effective Performer Model. Al- 
though instrumentality theory states no hy- 


4 


a 
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TABLE 5§ 


SORRELATIONS BETWEEN COMPONENTS OF ‘THE Maearne 
tive PrervorMerR Mop. AND Ovirant 
Jon Satisraction (Hoppocn) 


‘Treatment group 


Independent 
variable Achieves | Money | Control 
ment 
(N = 56) | (VN = 57) | (V = 56) 
xpectancy (1) 43%" 10 Jor 
\ttraction XY Instru- 
mentality for: 
Accomplishment 
X (BE) Agee 23" bane 
Achievement feed- 
back X (1) 43h Od 23" 
Recognition XY (I3)}  .41" 05 Jae 
Responsibility 
X (BE) ase 10 age 
Human relations 
X (EB) nD 05 ai* 
Policies & prac- 
tices X (I) oi" Ol .26* 
Salary X (I) 26" AS 26" 
Working condi- 
tions X (I) 30" — 06 28 
um of : 
Intrinsic outcomes 
X (E) 40%" ld 47" 
Extrinsic out- 
comes X (FE) oa 04d a2" 
All outcomes 
X (EB) 454" 09 420" 
Instrumentality 
only X (E) 61%" 22" 43m 
*) <.05. 
> < 01. 


thesis predicting job satisfaction from the 
ffective performer model, it is reasonable to 
xpect that this work role, being an integral 
art of the job situation, should be related 
0 overall job satisfaction, ‘The predictions 
f satisfaction from this model thus were 
nalyzed by correlating the components of the 
ffective performer model and Hoppock satis- 
action. Results of this analysis are shown 
n Table 5. The independent variables in this 
able include the products of the perceived 
ttraction and the perceived instrumentality 
if the role of effective performer for attaining 
he following role outcomes which are then 
aultiplied by expectancy: (a) each intrinsic 
utcome (accomplishment, achievement feed- 


back, recognition, and responsibility), (0) 
“ach extrinsic outcome (human relations, 
policies and practices, salary, and working 
conditions), (¢€) all intrinsie outcomes, (d) 
all extrinsic outcomes, and (é€) all outcomes 
(the complete effective performer model), 
Vinally, the two remaining independent vari- 
ables are the sum of only the instrumentalities 
without the attractions and the expectancy 
that increased effort will lead to more effective 
performance, 

‘The complete effective performer model was 
related significantly to satisfaction for the 
achievement and control groups but not for 
the money group, ‘These correlations were 
AS, 09, 42 for the achievement, money, and 
control groups, respectively, ‘These correla- 
tions were as strong as those for the job in- 
cumbent model, Again considering the in- 
trinsic and extrinsic categorization, both 
classes demonstrated significant correlations 
with satisfaction for the achievement and 
control groups but not for the money group, 
This finding was in contrast to that using the 
job incumbent model, Algo in contrast to the 
results using the job incumbent model, more 
of the correlations between the individual role 
outcomes and satisfaction were significant, 
lor the achievement and control groups all 
correlations involving individual role outcomes 
were significant, with the exception of human 
relations outcome for the achievement group. 
Only accomplishment showed a_ significant 
relationship for the money group, As with the 
job incumbent model, the sum of instrumen- 
talities had significant correlations for all 
treatment groups: .61, .22, and .43 for the 
achievement, money, and control groups, re- 
spectively, Finally, the correlations between 
expectancy and satisfaction were 43, 19, and 
33 for the achievement, money, and control 
groups, respectively, The strength of these re- 
lationships indicate that the perceived op- 
portunity to influence one’s performance is 
an important determinant of job satisfaction. 

Before-After—Job Incumbent, A major dif- 
ference between after-only and before-after 
measurement is that the after-only measures 
are influenced by status differences among 
Ss, such ag differences in abilities, work per- 
sonalities, and past work experiences, in ad- 
dition to the effects of the treatments, 
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whereas the before-after measures minimize 
the influence of these status differences. 
These status differences are minimized by 
using scores that represent the change in the 
variable of interest in the interval of time 
between the pretreatment and the posttreat- 
ment measures. In this manner, each S serves 
as his own control and the variation attribut- 
able to the influence of the treatment itself 
is maximized, Thus, gain scores are more 
appropriate than raw scores for testing the 
hypotheses of instrumentality theory. In this 
study, residual gain scores were employed as 
the measures of change due to their desirable 
psychometric characteristics (Harris, 1963). 
The time interval between pre- and _ post- 
measures for all gain scores was less than 
2..hr; 

The correlations of the components of the 
job incumbent model on the gain in overall 
job satisfaction are shown in Table 6. The 


TABLE 6 


CORRELATIONS BETWEEN COMPONENTS OF THE JOB IN- 
CUMBENT P MODEL AND D GAIN ID IN SaTisrac TION 


Treatment group 








Independent 
variable Achieve- | Money Control 
ment 
(N = 56) | (V = 57) | WV = 56) 
Attraction X Instru- 

mentality for: 
Accomplishment 19 — 12 10 
Achievement 

feedback aie — 08 — .06 
Recognition 19 — 06 10 
Responsibility on 18 07 
Human relations 02 — 03 05 
Policies & practices] .16 — 19 —.15 
Salary 14 02 — 13 
Working condi- 

tions 18 00 — 01 

Sum of: 

Intrinsic outcomes oa" — .04. 07 
Extrinsic outcomes} .19 —.07 —,10 
All outcomes .29* — 07 — 02 
Instrumentalities 

only 44% 05 Su 





Note.—Gain in satisfaction scores (residual gains) are the 
posttreatment Hoppock scores with immediate pretreatment 
aaigtastlon DAR ANe out. 


wd < 01. 
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independent variables are the same as in 
the analysis of raw satisfaction. According to 
Table 6, the correlations using the job in- 
cumbent model are quite different on the gain 
in satisfaction from those on raw satisfaction. 
The predictions of the complete job incumbent 
model (the sum of all outcomes) was sup- 
ported only for the achievement group. This 
was also the case for the intrinsic class of 
variables, whereas, the extrinsic class showed 
no significant correlations. Of the individual 
role outcomes, only achievement feedback and 
responsibility had significant correlations, and 
these were only for the achievement group. 
Finally, the sum of the instrumentalities 
demonstrated a significant correlation for only 
the achievement group. These results imply 
that the generally more positive results em- 
ploying the raw satisfaction scores probably 
were reflecting the influences of response sets 
(Guilford, 1954). These results on the gain in 
satisfaction support Hypothesis II for the 
achievement group, but not for the money or 
control groups. 

Before-A fter—Kffective Performer Model. 
The results of the correlations between com- 
ponents of the effective performer model and 
gain in overall job satisfaction are shown in 
Table 7. The independent variables in Table 7_ 
are the same as in the analysis of raw satis-— 
faction using this model. According to Table 
7, the correlations between expectancy and— 
gain in satisfaction are .32, .02, and .15 for 
the achievement feedback, money, and control — 
groups, respectively. As with the job incum- 
bent model, the complete effective performer 
model (the sum of all outcomes) significantly — 
predicted the gain in satisfaction for only thed 
achievement group. This also was the case for 
both the intrinsic and the extrinsic classes of 
role outcomes. In contrast to the results with 
the job incumbent model all of the correla- 
tions of the individual role outcomes were sig- 
nificant for the achievement group, with the 
exceptions of human relations and worki 
conditions. It should be noted that the out 
come of policies and practices was negatively 
correlated with satisfaction for the mone 
group, Again in contrast to the results usi 
the job incumbent model, the sum of instru 
mentalities was related significantly for bo 
the achievement and control groups. Finally. 









INSTRUMENTALITY Trnory or Work Motivation 15 


TABLE 7 


CORRELATIONS BELWEEN COMPONENTS OF THT 
Kerecrive Perrormer Mopet ANpb 
GAIN IN SATISFACTION 


‘Treatment group 





Independent ir ae 
variable Achieve- | Money Control 
ment 
(N = 56) | (VN = 57) | (N = 56) 
Expectancy (I) an 02 515 
Attraction X Instru- 
mentality for: 
Accomplishment 
X (E) owe 06 19 
Achievement feed- 
back X (1) Jann 00 06 
Recognition X (1) worm — 05 20 
Responsibility 
X (E) oun 18 Bi 
Human relations 
X (E) ale) — 04, 07 
Policies & Prac- 
tices X (I) 30" — 25" 00 
Salary X (I) oi" — 02 02 
Working condi- 
tions Y (I) 18 —06 06 
Sum of: 
Intrinsic out- 
comes X (I) 440" 08 20 
Extrinsic out- 
comes X (I!) +30" —,12 06 
All outcomes 
X (E) 40" — 03 8 
Tnstrumentalities 
only X (1) 40m 02 aoe 
Note.—Gain in gatlafaction (realdual gains) are the post. 
treatment Hoppock scores with immediate pretreatment satis- 
faction mhriiatied out. 
“> <.05, 
mid < 01, 


the remarks made in connection with the dif- 
ferences between the results of the after-only 
and the before-after analyses employing the 
job incumbent model also apply to these 
differences using the effective performer 
model. Moreover, the effective performer 
model successfully predicted the gain in 
satisfaction for only the achievement group. 
When the variance attributable to the in- 
fluence of the treatments is maximized, neither 
of the two prediction models of instrumen- 
tality theory predicted overall job satisfaction 
for either the money or the control group. 
At this point we will consider the question 
of the differences between the two work roles 


of job incumbent and effective performer. Our 
results thus far indicate that the predictions 
from models based upon these two work roles 
lead to somewhat different predictions. It now 
would be informative to consider the correla- 
tions of the major variables employed in both 
models. The results of this analysis are shown 
in Table 8. The correlations in Table 8 are 
those between corresponding components of 
the job incumbent model and the effective per- 
former model. The correlation between the 
complete models (the sum of all outcomes) 
for the roles of job incumbent and effective 
performer were .73, .46, and .68 for the 
achievement, money, and control groups, re- 
spectively. These correlations indicate that 
these two models probably are not tapping 
very different sources of variation for, at least, 
the achievement and the control groups. This 
also is the case for the two classes of out- 
comes (intrinsic and extrinsic). In contrast, 
the sum of instrumentalities appears to be 
measuring similar sources of variation for 
primarily the achievement group. Although 
both models deserve further research, the 
effective performer model should be given the 
higher priority. 


Hypothesis III 


If a role outcome is attained following the 
attainment of the role of effective performer, 
higher perceived instrumentality of that role 
for the attainment of like outcomes will result. 

If this hypothesis is to be supported in the 


TABLE 8 


CORRELATIONS BETWEEN COMPONENTS OF THE 
Jos INcuMBENT AND EFFECTIVE 
PERFORMER MopELs 





Treatment group 





Component Achieve- | Money | Control 
ment 
(NV = 56) | (V = 57) | (V = 56) 
Sum of: 

Tntrinsic outcomes 17 36 62 
Ixtrinsic outcomes 64 56 a 
All outcomes 73 46 .68 
Instrumentali- 

ties only .03 od 47 


Note.—Components of the effective performer model have 
been multiplied by expectancy. 
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TABLE 9 


MEANS ON PERCEIVED INSTRUMENTALITY OF THE WORK 
Roe OF EFrectrvE PERFORMER FOR THE 
ATTAINMENT OF SELECTED ROLE 
OuTCOMES AND EXPECTANCY 














Treatment group 
Instrumentality 
for outcomes Achieve- Money Control 
ment 
(NV = 56) | (V = 57) | (VN = 56) 

Accomplishment 3.66 3.47 3.36 
Achievement 

feedback 3.59* 3.44 3.18 
Recognition 3.46* O.25 3.02 
Responsibility 3.43 3.42 3.21 
Human relations Sill mel Sas 
Policies & practices 3.52* 3.37 3.21 
Salary Soe 3.68 3.39 
Working conditions Oe0s 3.47 SZ6 
Expectancy 4.18 4.12 4.09 

* Du<c05. 


present study, the achievement feedback group 
must demonstrate higher instrumentality than 
the control group between the role of effec- 
tive performer and the outcomes of achieve- 
ment feedback and recognition. In contrast, 
the money group must not show higher in- 
strumentality than the control group for salary 
outcomes, because the raise in pay was not 
contingent upon effective performance. This 
difference is predicted from the nature of the 
contingencies contained in the achievement 
feedback and money treatments. Achievement 
was contingent upon effective performance 
and money was not. 

Results on this hypothesis are shown in 
Table 9. The achievement feedback group 
was higher than the control group on achieve- 
ment feedback, recognition, policies and prac- 
tices, and working conditions. Differences on 
policies and practices and working conditions 
were not predicted from the hypothesis. Pos- 
sibly, the contingency between effective per- 
formance and treatment outcomes generalized 
to these outcomes. As predicted, perceived 
instrumentality for the money group was not 
higher than the control group on salary. Al- 
though the treatment group means on salary 
were in the right direction, the differences 
were too small to be reliable. These data sup- 
port the hypothesis that the consequence of 


receiving an outcome contingent upon the work 
role of effective performer is to increase the 
perceived instrumentality between that work 
role and like outcomes. These data show that 
instrumentalities are responsive to actual 
contingencies rather than being independent 
of the job situation. 


Hypothesis IV 


Job performance is a monotonically increas- 
ing function of the product of the attraction 
of the work role of effective job performer 
and the perceived expectancy that increased 
effort will lead to effective performance. 

After-Only—Effective Performer Model. 
The task performance measures were quality 
and quantity scores on two search tasks and 
two rounding tasks. The quality measure was 
the number of items correct divided by the 
number of items attempted, and the quantity 
score was the number of items attempted. The 
analysis was the same as that on satisfaction. 
The results of this analysis were that none of 
the component variables of the effective per- 
former model demonstrated any significant 
correlations with any of the task performance 
measures. Hypothesis IV of instrumentality 
theory received absolutely no support from 
this analysis. 

Before-A fter—Effective Performer Model. 
In this analysis, the influence of status differ- 
ences among Ss in ability and pretreatment 
motivation were minimized, and the variance 
attributable to the influence of the treatments 
maximized rendering this analysis more ap- 
propriate than the raw score analysis. The 
task measures were residual gain quality and 
quantity scores on two search and two round- 
ing tasks. Posttreatment tasks were completed 
immediately after the treatments were ad- 
ministered. The sequence of the posttreatment 
work tasks was search (E), rounding (F), 
search (G), and rounding (H). Pretreatment 
tasks (a search and a rounding task) were 
completed immediately before the treatment 
administration. The analysis of performance 
gain scores was the same as that for satisfac- 
tion using the effective performer model. The 
results of this analysis were that the model 
predicted consistently across at least two tasks 
on only the quantity measure of the rounding 
task—the more complex task. Results on this 
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ounding task are shown in Table 10. Ac- 
ording to Table 10, expectancy correlated 
vith gain in performance .33, —.23, .07 on 
Task F and .46, —.12, and .13 on Task H for 
he achievement feedback, money, and control 
Troups, respectively. These relationships in- 
licate that, in the achievement group, the 
asier Ss felt it was to improve their perform- 
mce, the more they improved their perform- 
ince. In contrast, in the money group, the 
nore difficult Ss felt it was to improve their 
yerformance, the more they did improve their 
erformance. The complete model (the sum of 
ll] outcomes) successfully predicted the gain 
n performance on the two rounding tasks for 
nly the achievement group. Further, both 
he intrinsic and extrinsic classes contributed 
o the gain in performance for the achieve- 
nent group. Considering the individual role 
yutcomes, accomplishment, responsibility, and 
vorking conditions consistently contributed to 
he gain in performance for only the achieve- 
nent group and achievement feedback and 
alary contributed to the gain in performance 
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on only a single task. Finally, the sum of the 
instrumentalities also was related consistently 
for the achievement group. Considering the 
full model, the effective performer model did 
predict consistently for the achievement group 
and not for the money or control groups. 
These results on the rounding task support 
Hypothesis IV of instrumentality theory for 
only the achievement feedback group and not 
for the money or control groups. 

A partial explanation for the finding that 
the effective performer model predicted con- 
sistently across only the quantity measures 
of the two rounding tasks is the different reli- 
abilities of the residual gain performance 
measures. The stability reliability coefficients 
for the four residual gain measures for each 
of the treatment groups are shown in Table 
11. According to Table 11, the quality scores 
are less reliable than the quantity scores. Of 
most interest, the most reliable measure for 
the achievement feedback group was the 
quantity measure of the rounding task. If the 
effective performer model were valid under 


TABLE 10 


CORRELATIONS BETWEEN COMPONENTS OF THE EFFECTIVE PERFORMER AND THE GAIN IN PERFORMANCE 








Work task 
Independent variable Rounding F Rounding H 
A M Cc A M Cc 

ixpectancy (E) eile — .23* 07 .40** —.12 13 
Attraction X Instrumentality for: 

Accomplishment X (E) .28* —.01 .10 44** 18 Alls 

Achievement feedback X (E) eid —.09 .20 ike .03 .09 

Recognition X (E) a7 — .08 .02 .20 als} —.06 

Responsibility X (E) -40** —.15 .10 -40** —.02 .00 

Human relations X (E) .16 — .09 —.07 19 —.07 —.06 

Policies & practices X (E) 06 —.16 —.08 18 .08 .00 

Salary X (E) 15 —.08 14 Dib .03 a2. 

Working conditions X (E) 29* —.16 —.14 .28* 02 —.07 
Sum of: 

Intrinsic outcomes X (E) .29* —.12 13 41** 08 07 

Extrinsic outcomes X (E) 22% —.16 —.05 ole 03 .00 

All outcomes X (E) .28* — 15 04 oon .06 .03 

Instrumentalities only X (E) One —.12 .08 40%" 05 .04 





Note.—Letters A, M, and C represent achievement, money, and control groups, respectively. Gain in performance scores 
residual gains) are the posttreatment performance scores with immediate pretreatment performance partialled out. 
*p < .05. 


D> < .01. 
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TABLE 11 


STABILITY RELIABILITY ESTIMATES FOR GAIN 
IN PERFORMANCE MEASURES 











Reliability 
Measure 
A M c 
Search quantity nS .80 16 
Search quality 65 no ml, 
Rounding quantity nS Bil .49 
Rounding quality 33 .16 98 





Note.—Letters A, M, and C represent the achievement, 
money, and control groups, respectively. Coefficients were 
calculated by correlating the two posttreatment parallel tasks. 


only those conditions that prevailed for the 
achievement group, given these estimates of 
reliability, consistent results might be ex- 
pected on only the quantity measure of the 
rounding tasks. 


Summary of Results 


The results of this may be summarized in 
the following manner: 

1. The procedural checks indicated that 
Ss in each of the conditions were responsive 
to the appropriate manipulations. The achieve- 
ment feedback group showed higher satisfac- 
tion than the control group with the role out- 
comes achievement feedback and recognition, 
and the money group showed higher satisfac- 
tion than the control group with the role 
outcome of salary. Moreover, the achievement 
feedback group indicated higher perceived 
performance than the control group on the 
criterion tasks. 

2. The results clearly confirmed the hy- 
pothesis (Hypothesis I) that the consequence 
of receiving a role outcome following the at- 
tainment of the role of job incumbent in- 
creases the perceived instrumentality of that 
role for the attainment of like outcomes. The 
achievement feedback group showed higher 
instrumentalities than the control group be- 
tween the role of job incumbent and the role 
outcomes of achievement feedback, recogni- 
tion, and human relations. In addition, the 
money group was higher than the control only 
on the role outcome of salary. 

3. The results concerning Hypothesis II 
(job incumbent model), considering both the 
static (raw score) and dynamic (gain score) 
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analyses, support this prediction of overall 
satisfaction only for the group performing in 
the reciprocating climate (achievement feed- 
back group) and not for the groups perform- 
ing in either the prompting climate (money 
group) or in the control climate (control 
group). 

4. Employing the effective performer model 
to predict overall satisfaction, again con- 
sidering both static and dynamic analyses, 
resulted in significant correlations for pri- 
marily the group performing in the reciprocat- 
ing climate (achievement feedback group) 
and possibly in the control climate (control 
group) but not in the prompting climate 
(money group). 

5. The results regarding Hypothesis III 
support the prediction that the consequence 
of receiving a role outcome contingent upon 
the role of effective performer is to increase 
the perceived instrumentality between that 
role and like role outcomes. These data show 
that instrumentalities were responsive to ac- 
tual contingencies and were not independent 
of the organizational climate. 

6. The data relevant to Hypothesis IV 
(effective performer model) predicting job 
performance support this hypothesis only in 
the dynamic (gain) analysis and only the 
group performing in the reciprocating climate 
(achievement feedback group). 

7. In the satisfaction analyses, the differ- 
ences in results between static (raw score) 
and dynamic (gain score) analyses probably 
reflect the influences of response sets and 
biases. 


Discussion 


These results taken as a whole indicate that 
instrumentality theory shows promise of being 
a scientifically useful model in our attempt to 
understand work motivation. Employing spe- 
cially designed measures of the major para- 
meters of instrumentality theory in a simu- 
lated organization, this study demonstrated 
that the job incumbent model can predict job 
satisfaction and that the effective performer 
model can predict both job satisfaction and 
job performance under certain conditions. In 
addition, this study demonstrated the pre- 
dicted consequences of certain job experiences 
on perceived instrumentalities. 


it. oeled Ae 
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The finding that both the job incumbent 
nodel and the effective performer model pre- 
licted satisfaction and the latter predicted 
yerformance under the conditions that existed 
or the achievement feedback group (recipro- 
ating climate) indicates the presence of cer- 
ain boundary conditions. What were the im- 
yortant differences between the achievement 
eedback treatment and the other two treat- 
nents? Only in the achievement feedback 
reatment was the contingency between effec- 
ive performance and the attainment of a 
avorable role outcome established in a con- 
rete manner—presenting the achievement 
eedback contingent upon effective perform- 
ince. In the control treatment (control cli- 
nate), this contingency was implied at best 
ut not demonstrated—evaluating previous 
verformance but not presenting a favorable 
ole outcome. In the money treatment 
‘prompting climate), this contingency was 
indermined by presenting a raise in pay not 
ontingent upon previous performance. 

This interpretation was supported by the 
esults of the before-after analysis on overall 
ob satisfaction and on the quantity of per- 
ormance on the rounding task. The correla- 
ions between the expectancy that increased 
ffort would lead to more effective perform- 
ince and both satisfaction and performance 
eflected the strength of the established con- 
ingencies between effective performance and 
ole outcomes. These correlations on satisfac- 
ion were .32, .02, and .15 and on performance 
vere .33, —.23, and .07 for Task F, and .46, 
—.12, and .13 for Task H for the achieve- 
nent feedback, money, and control groups, 
espectively. Therefore, an important bound- 
iry condition for instrumentality theory is 
hat contingencies must be established in a 
oncrete manner between effective job per- 
Ormance and attaining favorable role out- 
comes. 

The discovery of boundary conditions for 
nstrumentality theory suggests that one 
eason other theories of work motivation have 
lot been supported by empirical studies is 
hat they have failed to specify boundary 
onditions. If the present study had not em- 
lloyed the reciprocating climate, the results 
if this study would not have supported either 
aodel from instrumentality theory. In fact, 


the results would have been considered dam- 
aging to instrumentality theory. The point of 
this is that unless the boundary conditions of 
a theory can be specified, the theory applies 
under all conditions and can be tested legiti- 
mately under all conditions. It is unreasonable 
to assume that theories of work motivation 
can be applied to all conditions. An un- 
fortunate consequence of not being able to 
specify boundary conditions for our theories is 
that only “wide-band” theoretical formula- 
tions incapable of empirical disproof survive 
to haunt our textbooks and our students. 

The discovery of boundary conditions for 
instrumentality theory, if confirmed, has im- 
plications for the design of work organizations. 
If a goal of a work organization is to under- 
stand and predict work role satisfaction and 
performance, the boundary conditions of in- 
strumentality theory must be designed into 
the work situation. Instrumentality theory or 
any theory of work motivation can help to 
make work behavior understandable only 
after the boundary conditions have been met. 
In short, if the goal is to have employees 
respond to an organization in an understand- 
able and predictable manner, the organization 
must be designed in such a way that em- 
ployees perceive it as an understandable and 
predictable system. Employees’ work motiva- 
tion will be puzzling and unpredictable to the 
extent that the organization’s behavior toward 
its employees is perceived by the employees 
as puzzling and unpredictable. 

In addition to understanding and predic- 
tion, instrumentality theory promises to be 
useful for the enhancement of work motiva- 
tion. In this study, it was shown that the per- 
ception of instrumentality relationships was 
responsive to the actual contingencies of the 
job situation rather than independent of the 
job environment. Thus, the cognitive manip- 
ulations that prove so troublesome for in- 
equity theory (Adams, 1963) were not found 
in the perceptions of instrumentality relation- 
ships. If these results are confirmed, instru- 
mentalities could be enhanced by designing 
the work situation to produce stronger con- 
tingencies between work roles and role out- 
comes. If this model is valid and the boundary 
conditions are met, strengthening these con- 
tingencies should result in increased satis- 
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faction and work motivation, Thus, these re- 
sults suggest the possibility that if work 
organizations can be designed or restructured 
to be responsive to the work personalities of 
individuals, employees’ responses to work 
organizations may be understandable, predict- 
able, and reciprocal. 

Instrumentality theory did not successfully 
predict raw job performance for any of our 
three treatment groups. In fact, none of the 
parameters of instrumentality theory showed 
any consistent relationship to job perform- 
ance. In addition, Galbraith and Cummings 
(1967) attempted to test this hypothesis 
predicting job performance for operatives in 
a heavy equipment manufacturing company. 
These authors found so few statistically sig- 
nificant differences relative to the number of 
their tests that the most reasonable inter- 
pretation is that observed differences were due 
to the operation of chance alone. However, it 
should not be expected that instrumentality 
predicts a performance measure that is con- 
trolled primarily by status variables. More 
appropriate tests of the hypotheses derived 
from instrumentality theory were those on 
the gains in performance; the effective per- 
former model did predict this criterion for the 
achievement feedback group. The theory will 
be considered again with suggested modifica- 
tions based upon the results of this study. 


Sucerstep MopiricAtrons 


A basic modification of the models of in- 
strumentality theory must be the inclusion 
of the boundary conditions uncovered in this 
study. A statement of these hypothesized 
boundary conditions follows. If the job situa- 
tion is designed in such a way that perform- 
ance is evaluated and rewarded with favorable 
role outcomes and this contingency between 
effective performance and role outcomes is 
communicated to employees in a concrete 
manner, the models of instrumentality theory 
apply. If these boundary conditions are not 
met, the models of instrumentality theory can 
make no predictions. If these boundary con- 
ditions are met, the job incumbent model 
states that job satisfaction is a monotonically 
increasing function of the algebraic sum of the 
products of the perceived attraction of various 
role outcomes and the perceived instrumen- 


talities of the work role of being a job in- 
cumbent for the attainment of these various 
role outcomes. In contrast, the effective per- 
former model should be modified drastically 
to improve its prediction of job performance 
and satisfaction, 

The nature of these needed modifications 
of instrumentality theory is suggested by the 
work of Fishbein and his colleagues (Ander- 
son & Fishbein, 1965; Fishbein, 1967) on 
attitude research. These investigators started 
with instrumentality theory (Peak, 1955; 
Rosenberg, 1956) and developed a more gen- 
eral theory of social attitudes. Although Fish- 
bein and his associates have been able to 
predict satisfaction toward an object using 
other cognitions about that object, Fishbein 
(1967) states that knowledge of a person’s 
attitude toward an object does not allow the 
prediction of the way he will behave toward 
that object. He also rules out such variables 
as beliefs about the outcome and behavior 
intentions toward the outcome. Instead, he 
proposes to predict behavior by employing a 
theory of behavior prediction based on 
Dulany’s (1961) theory of propositional con- 
trol. Fishbein’s theory hypothesizes that the 
probability that an individual will emit a 
given act, with respect to a given outcome, in. 
a given situation, is a function of the follow- 
ing: (a) his beliefs concerning the conse- 
quences of the particular behavior; (0) the 
attraction of these consequences for him; 
(c) his beliefs about what he should do under 
the circumstances; (d) his motivation to 
comply. Further, the theory specifies that 
the first two and the last two terms be multi- 
plied and the resulting two products be 
weighted in a linear regression equation to 
predict the probability of behavior. These 
regression weights would differ for each act 
and each situation. 

A theoretical framework—“interdependent 
role systems”—may be developed by inte- 
grating the formulation of Fishbein, several 
ideas from Katz and Kahn, and the results of 
this study into the systems approach of modi- 
fied instrumentality theory. | 


Overvicw 


In general orientation, interdependent role 
systems theory views work organizations as 
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complex sets of interdependent work roles. 
Each position within an organization may 
contain a number of these work roles and 
each role may be interrelated with a number 
of roles assumed under different positions in 
the organization. Thus, a position may be 
related to one set of positions by one work 
role and related to an entirely different set of 
positions by a second work role. Work roles 
are interdependent in the sense that the ap- 
propriate behavior specified by one role de- 
pends upon the antecedent behavior specified 
by another’s role and/or is required before a 
subsequent role behavior of another can be 
performed. In general, work roles are inter- 
dependent to the extent that individuals in 
various related positions have a vested in- 
terest in the performance of one another’s 
role behaviors. Work roles are defined as sets 
of behaviors expected and considered ap- 
propriate for an incumbent of the position 
within the organization. Some requirements 
of the work role are specified formally by the 
organization (e.g., stated in terms of operating 
procedures) while others are determined by 
the actions of interested colleagues. 


Central Role 


The key work role for this theory is that 
of effective performer. The behavior expected 
and considered appropriate for an individual 
in this role is that his performance reflects 
that he is developing and growing as an asset 
to both his occupation and his organization. 
An individual may attain and maintain this 
role without necessarily changing his perform- 
ance standing relative to others in his work 
group (in the short run). In this sense, the 
theory is a model of growth and development 
rather than a model of performance standing. 
The details and criteria of the expected be- 
haviors for this role depend on the nature 
and interrelations of the roles assumed under 
the position in question and on the nature 
and interrelations of other interdependent 
roles. In the present study, the behavior ex- 
pected of an effective performer was improved 
quantity of performance, yet it could have 
been some other aspect of behavior. Similarly, 
the role of effective performer could relate to 
that of a supervisor by specifying improve- 


ment in planning and coordinating activities. 
In this way, the effective performer role can 
relate to a number of more general roles. 

As the discovery of boundary conditions 
in this study underlines, certain conditions 
must be met in the design of a work organiza- 
tion before the predictions from this theory 
apply. These conditions have been outlined 
above and they should be recalled again. 
In addition to boundary conditions, another 
difference between the interdependent role 
systems theory and other contemporary ap- 
proaches is the orientation toward dynamic 
as opposed to static correlations. 


Dynamic Correlations 


Interdependent role systems theory deals 
with changes in the work behavior of an in- 
dividual over time. The criteria of interest, 
using this model, are changes in the individ- 
ual’s behavior relative to his past perform- 
ance. Hence, this model requires the measure- 
ment of dynamic as opposed to static varia- 
tion. The dimension of time is our most 
faithful ally. Although static correlations (the 
correlations between prediction variables and 
the level of performance at one time period 
for a group of persons) may be appropriate 
for variables commonly assumed to be stable 
over time (e.g., abilities), static correlations 
are not desirable for dynamic variables (e.g., 
motivational variables), because they are in- 
sensitive to changes in the behavior of in- 
dividuals over time—the behavior of interest. 
In contrast, dynamic correlations (the cor- 
relations between prediction variables and the 
gain in performance over an interval of time 
for a group of persons) maximize the behavior 
of interest. The results of the present study 
supply ample evidence of this disguising effect 
of static correlations: Static correlations failed 
to support the hypothesis, although dynamic 
correlations did support the hypothesis con- 
cerning job performance. 


Hypothesis Concerning Job Performance 


Without further introduction the hypothesis 
derived from interdependent role systems 
theory for predicting the gain in performance 
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B= gain in performance, 

A, = preference for outcome 7 (attrac- 
tion). 

I; = belief that the attainment of the 
work role of effective performer 
will lead to outcome 7 (instru- 
mentality). 

I! = the difference between the sub- 
jective probability that the act 
involving superior effort will lead 
to more effective performance 
and that for the act involving 
standard effort. (2, — FE). 

R; = belief as to what person 7 expects 
him to do or not do (received 
role). 

P; = perceived pressure to comply 
with the expectations of person 7. 

A, = preference for the intrinsic conse- 
quence k of the act (attraction), 

ly, = expectancy that the act will lead 
to consequence k (expectancy). 

Wo, W1, We = beta weights of a linear, multiple 
regression equation that may 
take any values. 


The first term in this equation, the “path- 
goal utility” (Georgeopoulos, Mahoney, & 
Jones, 1957), is the attitude toward the act 
as a means to attain the role of effective per- 
former with its accruing role outcomes, This 
term was represented by our effective per- 
former model in the present study. Therefore, 
the above equation includes our effective per- 
former model and two additional terms. These 
additional terms refer to the external and 
internal pressures on the individual to per- 
form the act. The second term in the equation, 
“external pressure,” is the individual’s percep- 
tions of what others expect him to do and the 


pressure he feels they would apply to influence 
his compliance to their expectations, The 
third and final term in the equation, “internal 
pressure,” is the individual’s perceptions of 
the probability of various intrinsic conse- 
quences of the act and his preferences for 
attaining these various consequences. 

In short, an individual’s gain in the per- 
formance of a specific act in a given situation 
is a function of the following: 


la. His preferences for various role out- 
comes (attractions). 

Ib. His beliefs that the attainment of the 
work role of effective performer will 
lead to these various role outcomes 
(instrumentalities). 

lc. His belief that increased effort in the 
performance of the act will lead to 
more effective performance (ex- 
pectancy ). 

2a. His perceptions of what other persons 
expect him to do (received role). 

2b. His perception of the amount of pres- 
sure these other persons would apply 
to influence his compliance (per- 
ceived pressure). 

3a. His preferences for various intrinsic 
consequences of the act (attractions 
of consequences). 

3b. His beliefs that the act will lead to 
these various intrinsic consequences 
(expectancy of consequences). i 


The determinants of the gain in behavior 
assumed by this theory thus include three 
classes of variables: (a) path-goal utility, 
(b) external pressure, and (c) internal pres- 
sure. Moreover, the relative importance of 
these three classes of variables depends on 
the particular act and the particular situation. 
The form of the equation that combines these 
three components is specified as linear, mul- 
tiple regression; however, this specification is 
intended to be responsive to research on de- 
cision making, 

Although the results of the present study 
supported the hypothesis about the relation- 
ship between path-goal utility (the effective 
performer model) and the gain in perform- 
ance, it should be noted that path-goal utility 
is merely one component of this larger model. 
The second component, external pressure, in- 
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‘ludes considerations of the direction and 
nagnitude of influence from various sources 
xternal to the individual. Each attempt to 
xert pressure on the individual includes two 
eparate aspects: (@) information about what 
1e should or should not do and (0) influence 
0 elicit compliance with this information. The 
nformation aspect of this external pressure 
ncludes at least the two parameters of sign 
ind relevance. The sign of information is 
ither prescriptive or proscriptive. In addition, 
he influence aspect has at least the two para- 
neters of the strength of the influence attempt 
ind the perceived power of the person at- 
empting to influence. The strength of the in- 
luence attempt is the individual’s perception 
»f the magnitude of pressure intended. In con- 
rast, perceived power is the perception of 
he influencer’s ability to control various 
onsequences of compliance or noncompliance, 
uch as gratifications, deprivations, and 
yunishments. Therefore, external pressure in- 
ludes a variety of psychological costs and 
ewards not included under path-goal utility. 

The final component of this model, internal 
yressure, considers additional costs and re- 
vards. This component includes two aspects: 
(a) the attraction of various intrinsic con- 
sequences of performing the act and (0) the 
sxpectancy that the act will lead to these 
various consequences. Some examples of favor- 
ible consequences of an act are satisfactions 
issociated with performing the act (e.g., ex- 


associated with doing the task the way il 
should be done (e.g., complying with personal 
role expectations), In contrast, some ex- 
amples of unfavorable consequences are fa- 
tigue, frustration, threats to both physical 
and psychological health and well-being, op- 
portunity costs, and possible cognitive incon- 
sistencies. As shown in Figure 3, the prob- 
ability of superior effort is a function of the 
resolution of pressures toward and against 
superior effort, applied through path-goal 
utility, external sources, and internal sources, 


Hypothesis Concerning Job Satisfaction 


Work role satisfaction within this frame- 
work is determined by a more complex set of 
variables than have been assumed by past 
research, All three components of the job 
performance equations are employed to pre: 
dict the gain in satisfaction, This hypothesis 
for predicting the gain in satisfaction is as 
follows: 
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Fic. 3. Model from interdependent role systems theory for predicting the probability of 
superior effort. 
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where 


Sn = gain in satisfaction. 
9 eel. eile sex ternal siDLessureatcone 
sonant with the act. 


, K’ = internal pressure conso- 
nant with the act. 


ate 


This equation states that considering the 
most critical acts of a position, the gain in 
satisfaction is a function of the degree of 
path-goal utility, the amount of external pres- 
sure that is consonant with the given act 
relative to the total external pressure (both 
consonant and dissonant), and the amount of 
internal pressure that is consonant with the 
act relative to the total internal pressure. As 
shown in Figure 4, work role satisfaction is 
a function of the degree of path-goal utility 
and the extent to which both external and 
internal pressures can be resolved through 
appropriate behaviors. 

These hypotheses predict changes in effort 
and satisfaction respectively as a consequence 
of being rewarded contingent upon effective 
performance by an organization that main- 
tains a reciprocating climate. Space limita- 
tions do not permit a more complete state- 
ment of this theory, however, several sug- 
gested avenues for future research can be 
offered. The first component of the model, 
path-goal utility, should develop from re- 
search on the structure and formation of at- 
titudes and expectancies. The second and 





third components of the model should benefit 
from research on such concepts as power and 
authority, informational and normative in- 
fluence, role expectations, and role conflict 
resolution. In addition, many of the ideas of 
Katz and Kahn (1966) about the process of 
role taking should be refined, stated as hy- 
potheses, and tested. 


Suggestions for Future Research 


The results of this study imply that the 
work personality—work role systems approach, 
concerned with determining the effects of 
work role treatments on differing work per- 
sonalities, is as promising as the approach 
concerned only with the average behavior 
effects of work role treatments, if not more 
promising. This system approach attempts 
to capitalize on individual differences in work 
personality by making predictions based upon 
work personality—work role interactions. Re- 
sults further indicate that at least the two 
roles of job incumbent and effective performer 
make contributions to overall job satisfaction 
and the latter contributes to job performance. 
This suggests that it should prove worthwhile 
to explore other work roles, such as that of 
occupational development, within the systems 
framework. In addition, the analysis of organ- 
izational work groups from the work person- 
ality—work role systems approach should pro- 
vide useful information on existing contingen- 
cies and interactions, knowledge presently lack- 
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Fic. 4. Model from interdependent role systems theory for predicting work role satisfaction. 
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ng but needed for the effective employment 
f human talent. The need for a vast amount 
yf additional work on the models of instru- 
nentality and interdependent role systems 
heory is apparent. At the present time, in- 
trumentality and interdependent role sys- 
ems theories are best characterized as out- 
ines that specify certain kinds of variables 
hat should be important in understanding 
vork motivation. The task of writing the text 
rom either outline must rest upon the 
symbiotic interaction of researchers and 
heorists. 
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Test-wiseness on self-report personality scales was explored, using measures 
of accuracy in estimating the frequency of endorsement of personality items, 
estimating their social desirability, and identifying and “keying” items that 
measured the same factor, as well as indexes of ability to change scores on 
standard personality scales when they were administered with fake-good and 
fake-bad instructions, These variables generally did not correlate with each 
other, and they had only moderate and scattered correlations with personality 
scales administered with standard instructions. The test-wiseness measures were 
generally uncorrelated with ability and cognitive style tests and defensiveness 
scales, but they did correlate consistently with social desirability response 


style scales. 


According to test lore, people vary in their 
knowledge about tests, and this “test-wise- 
ness” affects their performance on these de- 
vices—the more test-wise obtaining higher 
scores on ability and aptitude tests and dis- 
‘orting their scores on personality inventories 
(Anastasi, 1961; Cronbach, 1960; Ebel & 
Damrin, 1960; Fishman, Deutsch, Kogan, 
North, & Whiteman, 1964; Goslin, 1963; 
Guilford, 1959; Pauck, 1950; Thorndike, 
1949; Vernon, 1958, 1962). Despite the prev- 
uence of these notions, the relevant data are 
sparse. Millman, Bishop, and Ebel (1965) 


1 This study was supported by the National In- 
sinder Research Grant 1 PO1 HD 01762. Portions 
of this study were presented at the meeting of the 
American Psychological Association, New York, Sep- 
vember 1967. Thanks are due Anne Bloxom for 
ocating and abstracting the studies reviewed in this 
urticle, Henrietta Gallagher for supervising the sta- 
-istical analyses, and Bruce Bloxom and Fred L., 
Yamarin for their critical reviews of a draft of this 
irticle, 

“Requests for reprints should be sent to the author, 
‘ducational Testing Service, Princeton, New Jersey 
18540, 
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have reviewed the scattered findings for ability 
tests and described some aspects of test-wise- 
ness on such measuring instruments. Even 
less is known about this phenomenon on per- 
sonality inventories. 

Test-wiseness on inventory measures of per- 
sonality may involve several abilities. Per- 
haps the most complex is the ability to re- 
spond in accordance with a prescribed role 
in completing a personality questionnaire. 
Responding in this way probably reflects, in 
addition to its own particular form of test- 
wiseness, the presence of other abilities in- 
volved in test-wiseness and sheer knowledge 
of the role (Bordin, 1943; Gough, 1947). 
This particular kind of “impression manage- 
ment” (Goffman, 1959) is displayed in role- 
playing studies (Dahlstrom & Welsh, 1960; 
Ellis, 1953; Waters, 1965), which compare 
scores on personality scales administered with 
standard instructions with scores on the same 
scales administered with instructions to fake 
either a good role (e.g., a superbly well-ad- 
justed person) or a bad one (e.g., a severely 
disturbed individual). People vary markedly 
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in their success in faking (Gough, 1952; 
Grayson & Olinger, 1957; Hedberg, 1962; 
Hunt, 1948; Kimber, 1947; Lanyon, 1967; 
Noll, 1951). These individual differences are 
not related to intelligence (Kelly, Miles, & 
Terman, 1936; Kimber, 1947; Noll, 1951) or 
age (Kelly et al., 1936; Noll, 1951), but 
women are more skillful at faking than men 
(Kimber, 1947; Noll, 1951) and well-ad- 
justed people are generally better at it than 
maladjusted ones (Canter, 1963; Grayson & 
Olinger, 1957; Hunt, 1948; Lanyon, 1967). 
Widely different findings about the generality 
of this ability have been reported; scores on 
the same scales administered with different 
role-playing instructions were unrelated in 
one study (Kelly et al., 1936) and highly 
related in others (Hedberg, 1962; Rusmore, 
1956). 

Another potentially relevant ability is ac- 
curacy in estimating the desirability of the 
items found on personality scales. Edwards’ 
(1957) social desirability paradigm suggests 
that people’s responses on these scales depend 
on the items’ desirability in their society, im- 
plying that accurate knowledge of desirability 
is needed for socially desirable responding 
and, more generally, for dissembling on per- 
sonality inventories. The two studies relevant 
to desirability estimation lend only moderate 
support to such a conception of this ability. In 
the first (Wiggins, 1966), Ss’ accuracy in 
estimating the average desirability ratings 
made by a sample of Ss like themselves cor- 
related with skill in faking-good on MMPI 
(Hathaway & McKinley, 1951) clinical 
scales, but did not correlate with success 
in faking-good on MMPI scales measur- 
ing test-taking attitudes.* In the second in- 
vestigation (Edwards, 1965), individual Ss’ 
personal judgments of social desirability cor- 
responded closely to average social desira- 
bility ratings from another group of Ss. But 
the extent of correspondence between their 
judgments and the average ratings, which 
may be a rough indication of their accuracy 
in estimating desirability, was uncorrelated 
with their scores on Edwards’ (1957) Social 


®° These results were obtained with an “absolute ac- 
curacy” score, which is roughly analogous to the 
scores used in Edwards’ (1965) study as well as in 
the present one. 


Desirability (SD) scale, a measure of socially 
desirable responding. 

A similar ability is accuracy in estimating 
the “communality” (Wiggins, 1962) or fre- 
quency of endorsement of personality items. 
The desirability of personality items and 
their communality are highly related; Ed- 
wards (1953) reported that the two cor- 
related .87. The relevant findings for this 
variable are negative: Accuracy in com- 
munality estimation did not correlate with 
ability to fake good on any of the MMPI 
scales that were studied (Wiggins, 1966). 

Accuracy in analyzing a personality scale 
and determining the nature of the traits that 
it is intended to measure may also be a 
pertinent ability. Individual differences in the 
transparency of personality scales have not 
been studied in the context of test-wiseness, 
though procedures to measure this kind of 
skill have been developed and used (Hofstee *; 
Seeman, 1952). A survey (Fiske, 1967) 
indicated that this skill may be fairly com- 
mon: A substantial proportion of the gen- 
eral population was aware that brief ver- 
sions of the personality inventories which they 
completed as part of the survey were in- 
tended to measure ‘‘personality” or “stabil- 
tye 

The present study was designed to investi- 
gate systematically the role of these test-wise- 
ness abilities on personality scales, using 
specially developed measures of these skills. 
The study’s specific purposes were to de- 
termine (a) the prevalence of these abilities; 
(6) their generality; (c) their relationships 
with performance on standard personality 
scales; and (d) their links with other ability, 
cognitive style, and personality variables that 
may also be implicated in performance on 
the standard personality scales or on the test- 
wiseness instruments. 


MetHop 
Subjects 


The Ss, paid volunteers, were 92 undergraduate 
women at an eastern state university. The results 
were analyzed for the 91 Ss for whom usable data 
were available. 

4 William K, B. Hofstee, personal communication, 
undated. 
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rocedure 


All the measures were obtained at the same testing 
ssion. At the outset, Ss were told the following: 


Pn trom Educational Lesting 
Service in Princeton. We are conducting a study, 
sponsored by the United States Government, at- 
tempting to evaluate the usefulness of ability tests 
and personality questionnaires. You’ll be taking a 
variety of tests and questionnaires today. ... We 
also want to emphasize that these tests and ques- 
tionnaires are intended solely for research purposes. 


he Ss were also asked to put their names on the 
ssts and questionnaires. 

The instruments were administered in the follow- 
ig order: (a) an inventory containing scales from 
1¢ «Guilford-Zimmerman Temperament Survey 
GZTS; Guilford & Zimmerman, 1949) and using 
ve standard GZTS instructions; (b) another in- 
entory containing SD response style and defensive- 
ess scales and administered with standard instruc- 
ons adapted from the California Psychological In- 
entory (CPI; Gough, 1957); (c) Advanced Vo- 
abulary Test (French, Ekstrom, & Price, 1963), 
fathematics Aptitude Test (French et al., 1963), 
etter Sets Test (French et al., 1963), Estimation 
juestionnaire (Pettigrew, 1958), and Object Sorting 
‘est (Clayton & Jackson, 1961)—all ability or cog- 
itive style measures; (d) Ability to Identify Items, 
istimating Communality, and Estimating Desira- 
ility—three test-wiseness measures; and (e) the 
1ventory containing the GZTS scales, readministered 
7ith fake-good instructions and then with fake-bad 
astructions. 


est-Wiseness Measures ° 


Estimating desirability. This instrument was simi- 
ir in rationale to Wiggins’ (1966) measure, and, to 
lesser extent, to the one employed by Edwards 
1965). It was constructed by selecting randomly 19 
tems from each of the five Dy scales (Jackson & 
flessick, 1961) of the MMPI, after eliminating items 
nm which 105 male and 85 female undergraduates at 
tanford University differed significantly (x? cor- 
ected for continuity, p< .05 ®) in their endorsement 
requencies (Wiggins, 1959, 1964a). The Dy scales 
epresent five levels of SD and their items overlap 
ninimally with standard MMPI clinical scales. The 


5 Estimating Desirability, Estimating Communal- 
ty, and Ability to Identify Items, as well as their 
coring keys, and the complete fake-good and fake- 
ad instructions have been deposited with the Na- 
ional Auxiliary Publications Service. Order Docu- 
aent No. 00363 from Nationa] Auxiliary Publications 
ervice of the American Society for Information 
cience, c/o CCM Information Sciences, Inc., 22 West 
4th Street, New York, New York 10001. Remit in 
dvance $3.00 for photocopies or $1.00 for microfiche 
nd make checks payable to: Research and Microfilm 
‘ublication, Inc. 

6 All the significance tests described in this article 
re two-tailed. 
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95 items were administered in random order with 
instructions “to judge how desirable the average 
college student would consider the behavior or opinion 
that is described by each statement.” Judgments were 
made on a 9-point scale, ranging from Extremely 
Undesirable to Extremely Desirable. The score was 
the product-moment correlation, transformed to 
Fisher’s z, between S’s desirability judgments for 
the 95 items and the items’ SD scale values, as ob- 
tained by Messick and Jackson (1961) from data for 
171 male and female undergraduates at Pennsylvania 
State University.7 

Estimating communality.8 This device paralleled 
the Estimating Desirability measure in design and 
resembled Wiggins’ (1966) measure in its general 
rationale. It consisted of 19 items randomly selected 
from each of the Dy scales, after eliminating items 
with sex differences in endorsement frequencies and 
items used on the Estimating Desirability measure. 
The items were administered in random order with 
instructions “to judge how frequently college stu- 
dents would respond ‘true’ to each statement when 
answering a questionnaire describing themselves.” 
Judgments were made on a 9-point scale that ranged 
from Extremely Infrequent to Extremely Frequent. 
The score was the transformed correlation between 
S’s frequency judgments and the items’ actual en- 
dorsement frequency for the Stanford undergradu- 
ates, combining both sexes.® 

Ability to identify items.1° This test was adapted 
from procedures employed by Seeman (1952) and 
Hofstee (see Footnote 4). It consisted of three simi- 
larly constructed subtests. Each subtest was based on 
a different published factor analysis of personality 
items that obtained eight or more rotated and in- 
terpretable factors, including a factor loaded (> .30) 
by at least 8 items. The studies used were by Com- 
rey and Soufi (1960, 1961) and Layman (1940). A 
subtest consisted of 15 items: the 8 with the highest 
loadings on the same factor and 7 others—each with 
the highest loading (>.30) on one of seven other 


7 The corresponding correlation ratio (7), based on 
the regression of the items’ SD scale values on S’s 
desirability judgments, was also computed. Twenty- 
eight of the 91 correlation ratios were signifi- 
cantly (p< .05) greater than the corresponding 
product-moment correlation coefficients, indicative 
of nonlinearity in the regression, but the product- 
moment correlation between the two alternative 
kinds of scores on this instrument—n and trans- 
formed r—was .96. 

8 This instrument was administered to Ss with the 
title, “Estimating Frequency.” 

®The correlation ratio was also computed, based 
on the regression of the items’ actual endorsement fre- 
quencies on S’s frequency judgments. Eleven of the 
91 correlation ratios were significantly greater than 
their respective correlation coefficients; the correla- 
tion between the two kinds of scores on this device 
was .97. 

10 The title, “Ability to Identify Personality Charac- 
teristics,” was used in administering this instrument 
to Ss, 
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factors. In choosing the main factor loaded by the 
8 items in each analysis, an attempt was made to 
obtain factors. that would be as different as possible 
in the three analyses. Those employed were “gre- 
gariousness” (Layman, 1940—Subtest A in this in- 
strument), “cheerfulness vs. depression” (Comrey & 
Soufi, 1961—Subtest B), and’“poor physical health” 
(Comrey & Soufi, 1960—Subtest C). In the Layman 
analysis the items were revised so that they were 
in the first person singular, corresponding to the 
wording of the items in the two other analyses; and 
in selecting items for the main factor one item was 
excluded because it was a direct reversal of an item 
with a higher loading which had already been 
chosen for that factor. 

The 15 items in a subtest were presented in random 
order, with instructions to identify the items that 
refer to the same personality trait and to “key” 
them by indicating the response (True or False) 
that reflected the presence of the trait. 

The subtest score was the sum of (a) the number 
of items from the main factor that were identified 
as involving the same trait. and correctly keyed and 
(6) the number of items not from that factor which 
S indicated did not refer to the same trait. Since 
the main factors were bipolar, S would be equally 
correct in keying the items to correspond to -either 
pole. Consequently, Ss’ answers were scored twice, 
first keying them for one pole, then keying them for 
the opposite pole; S received the higher of the two 
scores. A total score for the instrument was ob- 
tained, weighting the subtest scores for optimal re- 
liability (Green, 1950). 

Role-playing measures. These procedures were 
similar to those used in most role-playing studies. 
Four GZTS_ scales—General Activity, Sociability, 
Emotional Stability, and Personal Relations—were 
chosen for this purpose because they were highly 
reliable, moderately intercorrelated, roughly bal- 
anced in the proportion of items keyed yes and no, 
and judged likely to shift under role-playing in- 
structions. The scales were administered with fake- 
good and fake-bad instructions adapted from Yonge 
and Heist (1965). The fake-good instructions were 


Imagine that you are in the following situation: 
You have applied for admission to a college or 
university. As part of the selection process, you 
have been asked to complete this questionnaire. 
Since you want to be accepted for admission, you 
wish to make the most favorable impression pos- 
sible. Try to answer the questionnaire in a way 
that will make such a favorable impression on the 
admissions committee. 


The fake-bad instructions were 


Imagine that you are in the following situation: 
You have been forced to apply for admission to a 
certain university at the insistence of your parents. 
As part of the selection process, you have been 
asked to complete this questionnaire. Since you 
disagree with your parents’ decision and do not 
want to be accepted for admission to this univer- 
sity, you wish to make the most unfavorable im- 


pression possible. Try to answer the questionnaire 
in a way that will make such an unfavorable im- 
pression on the admissions committee. 


Three scores were obtained for each scale: (a) the 
content score, using the published key for the scale, 
obtained with fake-good instructions; (b) the con- 
tent score with fake-bad instructions; and (c) a 
difference score (McNemar, 1958, p. 48), represent- 
ing the estimated “true” difference between these 
two scores (i.e., the fake-bad score minus the fake- 
good score). It was anticipated that the difference 
score would be the most sensitive role-playing mea- 
sure, reflecting the actual influence of the role-playing 
instructions on Ss’ responses. The fake-good and 
fake-bad scores, used by themselves, provide no 
base line for assessing the extent to which the same 
scores would have been achieved with other role- 
playing instructions or with standard instructions. 
Although the difference score necessarily is highly 
related to the fake-good and fake-bad scores from 
which it is derived, the latter were also employed in 
this study’s analyses in order to provide continuity 
with previous investigations that used such scores.1? 


Ability and Cognitive Style Measures 


Measures of two major factors in the ability 
domain—verbal comprehension and general reason- 
ing—were administered in order to determine their 
similarity to the abilities tapped by the test-wise- 
ness measures. Verbal comprehension was measured 
by the Advanced Vocabulary Test—V4 (French et 
al., 1963) and general reasoning by the Mathematics 
Aptitude Test—R2 (French et al., 1963). 

Other ability tests and the cognitive style measures 
were administered because of their potential relevance 
to performance on Ability to Identify Items. One 
such ability was induction, which might be involved 
in examining the items and picking out a common 
subset of them. The measure of this variable was the 
Letter Sets Test—Part 1 (French et al., 1963). The 
cognitive style of category width (Bruner, Goodnow, 
& Austin, 1956; Pettigrew, 1958) also seemed per- 
tinent, for a predilection for overly wide categories 
could result in the judgment that the personality 
trait common to the items was very broad, encom- 
passing all of them; a bias in favor of overly narrow 
categories would result in the opposite effect. Cate- 
gory width was measured by the Estimation Ques- 
tionnaire (Pettigrew, 1958). Scores were obtained for 
Factor 1, involving time and speed items, and Factor 
2, encompassing more general content. Similar rea- 
sons dictated the inclusion of another cognitive style, 


11 Role-playing ability could also be assessed by 
profile analysis procedures, which have the unique 
advantage, in principle, of detecting Ss with im- 
plausibly extreme scores or suspicious score patterns. 
These methods were not used in this study because 
they have only been fully developed for the MMPI 
(Dahlstrom & Welsh, 1960), necessarily requiring 
that the investigation focus on that inventory and 
involving the unduly time-consuming administration 
and readministration of the entire MMPI. 
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equivalence range (Gardner, 1953). A tendency to- 
ward broad equivalence range could result in the 
judgment that all the items were similar, and, hence, 
reflect the same trait; a preference for a narrow 
range would imply that none of the items had any 
similarity. A group-administered version (Clayton & 
Jackson, 1961) of the Object Sorting Test (Gardner, 
1953) was used to measure equivalence range. Two 
scores were obtained from this test (Form I): the 
number of categories containing two or more ob- 
jects and the number of miscellaneous objects left 
uncategorized (Messick & Kogan, 1963). 

In view of the general relevance of psychological 
knowledge to test-wiseness, Ss were asked to list all 
the college-level courses in psychology that they had 
taken or were currently taking. The total number of 
credit hours was tabulated.!2 


SD and Defensiveness Measures 


Validity scales on the MMPI and CPI as well as 
SD scales were included in the battery in view of 
their pertinence to personality inventory perform- 
ance and in order to assess their similarity to the 
variables tapped by the test-wiseness measures. The 
particular scales that were chosen had been widely 
used in previous studies of this kind or represented 
distinctly different assessment strategies. Factor anal- 
yses (Edwards, 1963; Edwards, Diers, & Walker, 
1962; Edwards & Walsh, 1964; Martin, 1964; Quinn 
& Lichtenstein, 1965; Wiggins, 1964b) indicate that 
these and other such measures fall roughly into two 


12 Such variables as empathy and social intelligence 
may also be pertinent to test-wiseness, but they 
could not be included in this study because adequate 
measures of these characteristics are not available. 
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groups: those that assess SD response style—Ed- 
wards’ (1957) SD scale, Stricker’s (1963) SD scale, 
the Wb scale (Gough, 1957), the K scale (Hatha- 
way & McKinley, 1951), the F scale (Hathaway & 
McKinley, 1951), and the F—K index (Gough, 
1950)—and those that gauge defensiveness or lying 
—Wiggins’ (1959) Sd scale, the Marlowe-Crowne 
(Crowne & Marlowe, 1960) SD scale, and the L 
scale (Hathaway & McKinley, 1951). 
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Means and Theoretical Score Limits of Test- 
Wiseness Measures 


The means and standard deviations of the 
test-wiseness measures are reported in Table 
1, together with the theoretical minimum and 
maximum scores on each measure. 

A comparison of the mean scores with their 
theoretical limits indicates that Ss obtained 
high scores in absolute terms. The means for 
Estimating Desirability and Estimating Com- 
munality were 1.08 and .68, respectively, cor- 
responding to mean correlations of .79 and 
.59, The means for Ability to Identify Items 
and for the role-playing measures were even 
closer to their theoretical limits. The mean 
was 15.86 for Ability to Identify Items, and 

13 Tables reporting the means, standard deviations, 
and intercorrelations of all variables have been de- 


posited with the National Auxiliary Publications Ser- 
vice. See Footnote 5 for ordering information. 


TABLE 1 


MEANS, STANDARD DEVIATIONS, AND THEORETICAL SCORE LIMITS OF 
TrEst-WISENESS MEASURES 











Theoretical score limits 





Measure M SD 
Minimum Maximum 
test-wiseness | test-wiseness 
Estimating Desirability 1.08 16 —3.00 3.00 
Estimating Communality .68 mii — 3.00 3.00 
Ability to Identify Items 15.86 1.20 .00 19.39 
Fake-Good—General Activity 21.84 3.35 .00 30.00 
Fake-Good—Sociability 26.82 2.40 .00 30.00 
Fake-Good—Emotional Stability 26.56 2.81 .00 30.00 
Fake-Good—Personal Relations 2515 3.99 .00 30.00 
Fake-Bad—General Activity 5.70 4,12 30.00 .00 
Fake-Bad—Sociability 3.38 4.73 30.00 .00 
Fake-Bad—Emotional Stability 3.51 3.21 30.00 .00 
Fake-Bad—Personal Relations 2.86 Bros 30.00 .00 
Difference Score—General Activity —10.55 5.72 28.48 —21.93 
Difference Score—Sociability —13.35 5.71 30.18 —18.74 
Difference Score—Emotional Stability — 18.56 4.31 25.77 — 24.37 
Difference Score—Personal Relations —17.33 4.13 21.47 — 24.50 
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the theoretical limits on this instrument were 
O to 19.39. The means for the fake-good 
scales ranged from 21.84 to 26.82, those for 
the fake-bad scales ranged from 2.86 to 5.70, 
and the limits on both kinds of scales were 
O to 30.00. The means for the difference 
scores ranged from —10.55 to —18.56; their 
limits ranged from —18.74 to —24.50 on one 
side, and from 21.47 to 30.18 on the other. 

The fake-good means were consistently 
higher than the corresponding fake-bad means. 
These differences were highly significant (p < 
.001)—the ¢ ratios, computed for dependent 
groups, were 22.90 for General Activity, 35.03 
for Sociability, 42.53 for Emotional Stability, 
and 37.68 for Personal Relations. 


Intercorrelations of Test-Wiseness Measures 


The product-moment intercorrelations of 
the test-wiseness measures appear in Table 2 
together with estimates of their internal-con- 
sistency reliability. Reliability was estimated 
by the Spearman-Brown formula from the 
correlation between split-halves for Estimat- 
ing Desirability and Estimating Communality, 
by Green’s (1950) procedure for Ability to 
Identify Items, by Cronbach’s (1951) Co- 
efficient Alpha for the fake-good and fake-bad 
scales, and by Lord’s (1956) Formula 27 for 
the difference scores. In this analysis, as well 
as in subsequent ones, the correlations with 
the fake-bad scales and difference scores have 
been reflected in sign so that, in effect, high 
scores on all test-wiseness measures represent 
high ability. 

Except for the .23 correlation between 
Estimating Desirability and Estimating Com- 
munality, none of the correlations between the 
four kinds of instruments—Estimating De- 
sirability, Estimating Communality, Ability 
to Identify Items, and the role-playing de- 
vices—was significant (p> .05). The role- 
playing measures generally correlated signifi- 
cantly with each other, but the correlations 
were moderate. Their median intercorrelations 
were .26 for the fake-good scales, .34 for the 
fake-bad scales, and .40 for the difference 
scores. The corresponding fake-good and fake- 
bad scales consistently correlated with each 
other—.60 for General Activity, .53 for So- 
ciability, .46 for Emotional Stability, and .33 
for Personal Relations—and fake-good and 
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fake-bad versions of different scales were 
frequently correlated too. 


Correlations of Test-Wiseness Measures with 
Standard GZTS Scales 


Total group. The product-moment correla- 
tions of the test-wiseness measures with the 
four GZTS scales administered with standard 
instructions appear in Table 3 for the total 
group of Ss. The Coefficient Alpha reliability 
estimates for the four scales are also reported 
in this table. 

The test-wiseness measures had some sig- 
nificant but moderate correlations with the 
GZTS scales. Apart from a .20 correlation 
between Estimating Communality and the 
Personal Relations scale, all the correlations 
of Estimating Desirability, Estimating Com- 
munality, and Ability to Identify Items were 
limited to the Emotional Stability scale (7’s 
were .21, .27, and —.29, respectively). The 
role-playing measures generally correlated 
with their own GZTS scale, when it was ad- 
ministered with standard instructions—the 
consistent exception was the Sociability scale 
—hbut they had few correlations with other 
GZTS scales. The fake-good scales that cor- 
related with the corresponding standard scales 
were General Activity (r= .31), Emotional 
Stability (r= .32), and Personal Relations 
(r= .52); the fake-bad scales that corre- 
lated were General Activity (r= .20) and 
Emotional Stability (r = .30); and the differ- 
ence scores involved were General Activity 
(yr = .27), Sociability (r= .36), and Per- 
sonal Relations (r= .47). The role-playing 
measures that correlated with standard scales 
other than their own were the fake-good Per- 
sonal Relations scale (yr = .30 with the Emo- 
tional Stability scale), the fake-bad Emotional 
Stability scale (r= .20 with the Sociability 
scale), and the Personal Relations difference 
score (ry = .27 with the Emotional Stability 
scale). 

Subgroups defined by response styles. The 
product-moment correlations of the test-wise- 
ness measures with the standard GZTS scales, 
computed separately for Ss above and below 
the median on composite measures of SD re- 
sponse style and defensiveness, appear in 
Table 4.4 


14The composite measure of SD response style 
was the sum of the standard scores on Edwards’ SD 
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TABLE 3 


CORRELATIONS OF TEST-WISENESS MEASURES WITH STANDARD GZTS SCALES 











Standard GZTS Scale 











Measure G 1 ences : ' 
enera A aE motiona ersona 
Activity Bociapiicy Stability Relations 
Estimating Desirability .08 02 B21 ah 
Estimating Communality .06 —.07 215* =20e 
Ability to Identify Items .16 04 —.29** —.04 
Fake-Good—General Activity aot 07 04 —.01 
Fake-Good—Sociability 04 — .03 — .09 —.10 
Fake-Good—Emotional Stability .16 03 ue Bled, 
Fake-Good—Personal Relations 04 —.10 pS Ure One 
Fake-Bad—General Activity P20 08 —.10 —.15 
Fake-Bad—Sociability —.08 00 04 —.02 
Fake-Bad—Emotional Stability 15 .20* Une 25 
Fake-Bad—Personal Relations 14 — .02 .08 14 
Difference Score—General Activity oa 09 — .05 —.11 
Difference Score—Sociability — .06 —.01 O01 —.05 
Difference Score—Emotional Stability .18 14 a3 Ons .16 
Difference Score—Personal Relations .08 — .09 ee AT** 
Internal-Consistency Reliability 84 85 85 80 
wp <= .05 
KD <.01 


There were scattered differences in the cor- 
relations for the high and low groups. In 5 
of the 60 pairs of correlations in the SD re- 
sponse style analysis, the correlations for the 
two groups were significantly different (p < 
.05, using a 2 test for transformed correla- 
tions): Estimating Desirability correlated 
—.16 with the Emotional Stability scale in 
the high SD response style group and .40 
with this scale in the low group; the fake- 
good General Activity scale correlated .55 and 
.08 with the standard version of this scale in 
the two groups, the fake-good Emotional 
Stability scale correlated .24 and —.20 with 
the Sociability scale, the fake-bad Personal 
Relations scale correlated .40 and —.04 with 
the standard version of this scale, and the 
Emotional Stability difference score cor- 





scale, Stricker’s SD scale, the Wb scale, the K scale, 
and the F scale, weighting each score equally. Since 
a high score on the F scale, unlike the other scales, 
reflects low social desirability, its scores were re- 
versed before summing. The corresponding com- 
posite measure of defensiveness was computed in the 
same way, using Wiggins’ Sd scale, the Marlowe- 
Crowne SD scale, and the L scale. The Coefficient 
Alpha reliability was .91 for the first composite mea- 
sure and .72 for the second. 











related 38 and —.11 with the Sociability 
scale. In the defensiveness analysis, 4 pairs 
of correlations were significantly different: 
Estimating Communality correlated .20 with 
the Sociability scale in the high defensiveness 
group and —.27 in the low one, the fake-good 
Sociability scale correlated .17 and —.28 
with the Emotional Stability scale in the 
two groups, the fake-bad Sociability scale 
correlated —.28 and .17 with the General 
Activity scale, and the Emotional Stability 
difference score correlated .37 and —.04 with 
the Personal Relations scale. 


Correlations of Test-Wiseness Measures with 
Ability, Cognitive Style, SD, and Defensive- 
ness Measures 


The product-moment correlations of the 
test-wiseness measures with the ability and 
cognitive style tests and the SD response 
style and defensiveness measures appear in 
Table 5, Estimates of the internal-consistency 
reliability of most of these measures are also 
reported. Reliability was estimated by the 
Spearman-Brown formula for all the ability 
tests except the Letter Sets Test; by the 
correlation between Parts 1 and 2 for the 
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latter test '®; and by Coefficient Alpha for 
the Estimation Questionnaire as well as for 
the response style and defensiveness scales. 

The test-wiseness measures did not cor- 
relate significantly with any ability test, and 
they had only scattered correlations with the 
cognitive style measures, mostly involving 
the Object Sorting Test. The fake-bad Emo- 
tional Stability scale correlated (r = —.33) 
with the Categories score on the Object Sort- 
ing Test, and so did the difference score on 
this scale (r= —.31). The fake-good, fake- 
bad, and difference measures of Sociability all 
correlated with the Miscellaneous Objects 
score on the Object Sorting Test (7’s were 
—.27, —.24, and —.27, respectively). Ability 
to Identify Items and the fake-good General 
Activity scale both correlated with the Factor 
1 score on the Category Width Test (r’s were 
—.22 for the former measure and .21 for the 
latter). And Estimating Communality cor- 
related .25 with psychology credit hours. 

Estimating Desirability, Estimating Com- 
munality, and Ability to Identify Items con- 
sistently had significant correlations with the 
SD measures; the two accuracy instruments 
were positively related to socially desirable 
responding, but Ability to Identify Items was 
negatively related to this response style. All 
correlated with Edwards’ SD scale (r’s were 
.26, .26, and —.22, respectively), the Wd 
scale (r’s were .23, .20, and —.23), the F 
scale (7’s were —.27, —.31, and .20), and 
the F — K index (r’s were —.22, —.31, and 
.21). Both Estimating Communality and 
Ability to Identify Items correlated with 
Stricker’s SD scale, .24 for the former, —.20 
for the latter. Estimating Communality also 
correlated .23 with the K scale. 

In contrast to the wide correlations of these 
test-wiseness instruments with the SD mea- 
sures, only one correlated with any defensive- 
ness scale—Ability to Identify Items cor- 
related —.20 with the L scale. 

Virtually all the significant correlations for 
the role-playing measures involved the Emo- 
tional Stability and Personal Relations scales, 
on the one hand, and the SD scales, on the 
other; role-playing success on these GZTS 


15 This correlation was based on a subgroup of 45 
Ss with scores on both parts of this test. 


scales was positively related to socially de- 
sirable responding. The fake-good and fake- 
bad Emotional Stability scales and their 
difference score correlated .24, .26, and .29, 
respectively, with Edwards’ SD scale; the 
corresponding correlations with the Wd scale 
were .23, .20, and .25. The difference score 
for Emotional Stability also correlated .20 
with Stricker’s SD scale. The pattern of cor- 
relations for the Personal Relations measures 
was similar to the one for the Emotional Sta- 
bility measures. All three measures correlated 
with Edwards’ SD scale (r’s were .23, .20, 
and .26, respectively). In addition, the fake- 
good version correlated .24 with Stricker’s 
SD scale, and the difference score correlated 
.22 with this SD scale. The fake-bad scale 
and the difference score correlated —.24 and 
—.20, respectively, with the F scale. 

Like the other test-wiseness instruments, 
the role-playing measures had few correla- 
tions with the defensiveness scales. The fake- 
bad Sociability scale and the difference score 
on this scale both correlated with Wiggins’ 
Sd scale (r’s were —.25 for the fake-bad 
scale and —.23 for the difference score); and 
the fake-good Emotional Stability scale cor- 
related .20 with the Marlowe-Crowne SD 
scale. 


Correlations of SD and Defensiveness Mea- 
sures with Standard GZTS Scales 


The product-moment correlations of the 
SD response style and defensiveness measures 
with the GZTS scales administered with stan- 
dard instructions appear in Table 6. 

The Sociability, Emotional Stability, and 
Personal Relations scales correlated signifi- 
cantly with all the SD measures; high scores 
on these GZTS scales were positively related 
to socially desirable responding. The three 
correlated with Edwards’ SD scale (r’s were 
46, .75, and .43), Stricker’s SD scale (r’s 
were .38, .66, and .55), the Wd scale (r’s 
were .43, .62, and .41), the K scale (r’s were 
42, .58, and .44), the F scale (7’s were —.25, 
—.44, and —.38), and the F — K index (r’s 
were —.42, —.62, and —.49). The General 
Activity scale correlated with only one of 
these response style measures—Edwards’ SD 
scale (r = .22). 
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TABLE 6 


CoRRELATIONS OF RESPONSE STYLE MEASURES WITH STANDARD GZTS Scales 














Edw. Str. 





GZTS Scale SD SD Wb K 
eneral Activity Hh: .05 15 .03 
ciability 46** eS Cia 4g** 42** 
motional Stability 13 .66** soon pone 
srsonal Relations .43** 5a A 1** AA 

*b <.05 

v5 <.01 


The GZTS scales also correlated with the 
efensiveness scales—high scores on the GZTS 
‘ales were positively related to defensiveness, 
ut the correlations were lower in level and 
ss consistent: The General Activity and 
ociability scales correlated with Wiggins’ Sd 
ale (7’s were .27 and .35); and the So- 
ability, Emotional Stability, and Personal 
elations scales correlated with both the 
[arlowe-Crowne SD scale (r7’s were .31, .42, 
nd .29) and the Z scale (7’s were .25, .24, 
nd .26). 


DISCUSSION 
revalence of Test-Wiseness 


The high level of test-wiseness displayed in 

iis study was striking. These results are con- 
stent with those obtained previously. 
farked shifts in scores have been routinely 
bserved in role-playing studies (Dahlstrom 
Welsh, 1960; Ellis, 1953; Waters, 1965), 
idicating considerable skill in this activity. 
nd, in Edwards’ (1965) investigation, Ss’ 
esirability ratings predicted SD scale values 
ith great accuracy. In the face of the con- 
derable test-wiseness that exists, it is note- 
orthy that wide and reliable individual dif- 
srences were observed in this study, as well 
s in earlier ones. 

In evaluating the present findings, it should 
e noted that Ss, prior to the study, probably 
ssembled most college students in the extent 
f their exposure to personality inventories 
nd to other experiences that might produce 
sst-wiseness. Although high already, their 
sst-wiseness, like other abilities, could prob- 
bly be improved by appropriate training 
Ebel & Damrin, 1960). The relationship be- 
ween amount of course work in psychology 


Wig. M-C 


F ie Sd SD L 
—.02 — .03 Dice 07 —.01 
SS Mae —.42** Sore 315% a207 
—.44** O20 19 A2** .24* 
rie —.49** .06 EDO PA Ome 


and Estimating Communality, though modest, 
suggests that the ability tapped by this in- 
strument, at least, is amenable to training. 


Generality of Test-Wiseness 


Another important finding was the spec- 
ificity of the test-wiseness measures: The 
different kinds of instruments were unrelated, 
with one minor exception, though there was 
some generality among the role-playing vari- 
ables.*® This finding implies that test-wiseness 
is not a broad, general ability, but consists of 
a set of distinct and largely unrelated skills. 
The confirmation of this inference requires 
studies of other populations of Ss, especially 
more naive ones who are known to have no 
experience with personality inventories, and 
by appraisals of the test-wiseness measures’ 
construct validity. The logical possibility 
exists that the abilities tapped by the mea- 
sures simply did not reflect test-wiseness or 
only sampled limited aspects of it. Indeed, the 
particular abilities that were studied prob- 
ably do not exhaust those in the test-wiseness 
domain; still, they do seem to be a good 
representation of this domain, with the pos- 
sible exception of the variable tapped by 
Ability to Identify Items, which functioned in 
unexpected ways. The operation of this mea- 


16 Some of the relationships among the role-playing 
variables may be due, in part, to shared variance 
introduced by using the same instructions or item 
content for two or more variables. In contrast, all 
the relationships with and among the other test-wise- 
ness measures were heterotrait, heteromethod (Camp- 
bell & Fiske, 1959), for these instruments differed 
among themselves and from the role-playing vari- 
ables in both instructions and item content. (Differ- 
ent, but essentially equivalent, sets of items were used 
in Estimating Desirability and Estimating Com- 
munality.) 
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sure will be discussed in more detail sub- 
sequently. 

Especially puzzling was the role-playing 
measures’ complete independence of Estimat- 
ing Desirability and Estimating Communality. 
The results obtained by Wiggins (1966), in 
a similar investigation, had generally been 
negative, too. The present study and Wig- 
gins’ study point to the conclusion that ac- 
curacy in estimating desirability and com- 
munality is not an important ingredient in 
role-playing success. A minimal degree of 
accuracy may be essential, but this is prob- 
ably exceeded by most Ss, judging from the 
extent of test-wiseness that has been ob- 
served. Very special skills may be involved in 
projecting oneself into a role. The role in 
this study, for example, had two distinct re- 
quirements that had to be mastered, and these 
demanded a combination of test-wiseness and 
role knowledge. One was assuming the guise 
of an applicant for college admission. This 
necessitated accurate perception of the gen- 
eral range of responses appropriate for col- 
lege-age high school graduates, as contrasted, 
for example, with those associated with in- 
stitutionalized psychiatric patients. It is likely 
that Ss varied in the accuracy of these per- 
ceptions. The second was determining the kind 
of performance on the inventory that would 
impress or repel the admissions committee. 
Just as they would in a real life counterpart 
of this situation, Ss taking this role undoubt- 
edly differed in their views of what would 
impress or repel the committee and in their 
ability to respond in a way that would achieve 
the effect that they desired. For these reasons 
the responses elicited by such role-playing 
instructions are apt to be considerably more 
complex than, and sharply divergent from, 
judgments of social desirability or communal- 
ity. One indication of such a divergence is 
that role-playing responses typically vary for 
different roles that are similar in overall favor- 
ability (Hedberg, 1962; Krug, 1958; Nor- 
man, 1963; Rusmore, 1956; Wesman, 1952; 
Yonge & Heist, 1965), indicating that role- 
playing is not based on a generalized notion 
of social desirability or communality. In rela- 
tively uncomplex role-playing situations, the 
responses that are elicited may have a greater 
resemblance to judgments of social desira- 


bility and communality—the simplest role- 
playing instructions explicitly or implicitly 
solicit these judgments. In such situations, 
some correspondence, perhaps a great deal, 
would be expected between role-playing suc- 
cess and accuracy in estimating these item 
characteristics. 


Test-Wiseness and Performance on Standard 
Personality Scales 


The Ss had the ability to distort their 
scores on personality scales, as gauged from 
the extent of test-wiseness that was observed, 
but the limited relationships between the test- 
wiseness measures and the GZTS personality 
scales imply that test-wiseness was not a 
major source of dissembling. It seems rea- 
sonable to assume that at least some dis- 
sembling occurred on the personality scales, 
and the strong possibility that the distortion 
was considerable is suggested by the scales’ 
extensive correlations with the response style 
measures (Edwards, 1957; Jackson & Mes- 
sick, 1958), though these correlations may be 
open to less sinister, substantive interpreta- 
tions, too (Block, 1965). The failure of the 
accuracy measures and Ability to Identify 
Items to correlate with scores on the fake- 
good and fake-bad scales suggests that the 
same kind of negative results would also be 
observed in real life situations where greater 
incentives to distort exist. This suggestion is 
extremely tenuous, however, because of the 
uncertain correspondence between such role- 
playing scores and those obtained in real life. 

The general lack of relationship between the 
test-wiseness measures and the standard per- 
sonality scales raises the obvious question: 
Why wasn’t test-wiseness closely linked with 
distortion on these scales? At this juncture, 
this question cannot be answered with any 
certainty. One plausible answer is that test- 
wiseness did, indeed, produce distortion, but 
this relationship could not be uncovered in 
this study. The link between the two could 
have been obscured by the heterogeneity of | 
Ss, who may have consisted of (a) those un-— 
motivated to distort, (b) those motivated to — 
distort in a favorable direction, and (c) those 
motivated to distort in an unfavorable direc- 
tion—Ss in the last two groups varying in 
the extent of their motivation. However, the 
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yeneral absence of differences between the 
sorrelations of the test-wiseness measure with 
the personality scales for those with high and 
those with low levels of SD response style 
and defensiveness argues against such an ex- 
Jlanation, insofar as the response style mea- 
sures reflect the extent of dissembling and the 
lirection that it takes. These moderator-vari- 
able analyses are hampered, though, by the 
ircularity introduced through the conceptual 
and empirical linkage between the response 
style and test-wiseness variables, and by the 
slurring of distinctions among the three pos- 
sible kinds of Ss that was produced by di- 
chotomizing the score distributions for the 
response style measure. Another possibility, 
as mentioned previously, is that the test-wise- 
ness measures lacked construct validity. If, in 
fact, test-wiseness was not a major source of 
distortion, no compelling explanation can be 
offered for why this should be so. One specu- 
lation, difficult to verify, is that dissembling 
is largely unconscious, and, for this reason, 
not dependent on the skills involved in the 
test-wiseness measures. 

In the face of the generally negative results 
that were obtained, the frequent, though 
moderate, correlations between the role-play- 
ing measures and their own standard per- 
sonality scales provide an interesting excep- 
tion. These correlations, when considered to- 
vether with the overall lack of correlation of 
these same role-playing measures with other 
standard personality scales, furnish additional 
evidence for the specificity of test-wiseness 
that was already observed in the intercorrela- 
tion analysis. The present findings indicate 
that role-playing success on a particular scale, 
=ven when it was related to performance on 
the same scale administered with standard 
instructions, did not generalize to perform- 
ance on other standard personality scales. 


Test-Wiseness and Other Variables 


All of the different kinds of test-wiseness 
measures, including Estimating Desirability, 
correlated extensively with the SD measures, 
though Edwards (1965) had found that a 
‘ough measure of accuracy in desirability 
sstimation was uncorrelated with an SD scale. 
[hese extensive correlations contrast sharply 
vith the scanty correlations of the test-wise- 


ness measures with the defensiveness scales. 
The explanation may lie in differences in the 
item composition of the two kinds of response 
style scales (Jackson & Messick, 1962). The 
SD scales contain items with comparable so- 
cial desirability and communality—the de- 
sirable items are frequently endorsed and the 
undesirable items are rarely endorsed. The 
items on the defensiveness scales typically 
have discrepant desirability and communality 
—the desirable items are seldom endorsed 
(e.g., “I read in the Bible several times a 
week.”) and the undesirable ones are often 
endorsed (e.g., “At times I feel like swear- 
ing.”). Such defensiveness items are deviant 
because of the high relationship between social 
desirability and communality, and are atypi- 
cal of most items found on personality in- 
ventories, including the GZTS scales, as well 
as on Estimating Desirability, Estimating 
Communality, and Ability to Identify Items. 
The complex and unusual interaction of de- 
sirability and communality on the defensive- 
ness items, their rareness, and their under- 
representation on the test-wiseness measures 
should attenuate the correlations of the de- 
fensiveness scales with the _ test-wiseness 
measures. A different explanation of these 
findings is that the defensiveness scales may 
tap a characteristic that is under less con- 
scious control than the trait measured by the 
SD scales. Socially desirable responding may 
entail a relatively systematic consideration of 
the items’ desirability and communality, ac- 
counting for its link with the accuracy mea- 
sures; defensiveness, because it may not in- 
volve such considerations, is unrelated to 
knowledge of these item characteristics. 

The finding that the various kinds of test- 
wiseness measures correlated with the SD 
scales, but, in general, did not correlate with 
each other suggests that different processes 
may underlie their relationships with this re- 
sponse style. One conjecture is that the link 
with Estimating Desirability and Estimating 
Communality, two measures that were re- 
lated, may be causal, as indicated in the pre- 
ceding discussion: Accurate knowledge of the 
desirability and communality of personality 
items is a prerequisite for choosing and mak- 
ing desirable and common responses. In con- 
trast, the relationship between this response 
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style and role-playing may occur because they 
are symptomatic of the same _ personality 
dynamics. 

The tie between the test-wiseness measures 
and SD response style may also explain why 
most of the test-wiseness measures were re- 
lated to the Emotional Stability scale, but un- 
related to the other standard personality 
scales. The Emotional Stability scale also cor- 
related with the SD measures; indeed, it was 
more highly correlated than any of the other 
GZTS scales with all the SD measures. As a 
result, the relationship between the test-wise- 
ness measures and this personality scale may 
have been produced by the response style 
variance common to them. In fact, eliminat- 
ing, by partial-correlation procedures, the SD 
response style variance shared by the test- 
wiseness measures and this personality scale 
markedly attenuated the measures’ correla- 
tions with the scale.’ 

The general lack of relationship between 
the test-wiseness devices and the ability tests 
and cognitive style measures in this study 
suggests that the test-wiseness skills exist in- 
dependently of these other variables. The gen- 
erality of this inference needs to be established 
by studies of other groups of Ss. This finding 
is consistent with previous reports that role- 
playing success and intelligence were uncor- 
related (Kelly et al., 1936; Kimber, 1947; 
Noll, 1951). Although it would be useful to 
investigate the relationship of the test-wiseness 
measures to a broader spectrum of such 
variables, the abilities and cognitive styles 


17Jn this analysis, composite measures of SD re- 
sponse style and defensiveness were separately par- 
tialed out of all of the correlations between the 
test-wiseness measures and the GZTS_ personality 
scales administered with standard instructions. In 
computing these partial correlations, the correlations 
with the control variable were corrected for attenua- 
tion in this variable to prevent the undercorrection 
of the other variables arising from the unreliability 
of the control variable (Kahneman, 1965). 

The only noticeable difference between the original 
correlations and the partial correlations was that 
most of the significant correlations of the test-wise- 
ness measures with the Emotional Stability scale 
were reduced to nonsignificant levels when SD re- 
sponse style was held constant. A table containing 
all the partial correlations has been deposited with 
the National Auxiliary Publications Service. See 
Footnote 5 for ordering information. 


already studied were specifically chosen for 
their manifest relevance to test-wiseness, re- 
ducing the likelihood that further work along 
this line will uncover links between test-wise- 
ness and other abilities or cognitive styles. 


Functioning of Ability to Identify Items 


The deviant pattern of correlations for 
Ability to Identify Items (e.g., its negative 
correlations with the standard Emotional Sta- 
bility scale and the SD measures in contrast 
with the positive or nonsignificant correla- 
tions of the other test-wiseness measures with 
these variables) suggests that the ability it 
measures is not an element of test-wiseness, at 
least in the way that this phenomenon was 
originally conceived. This instrument’s cor- 
relations with the other variables may have 
simply occurred for reasons unrelated to test- 
wiseness (e.g., its negative correlations with 
the Emotional Stability scale and the SD 
measures could be produced because the 
analytic people who do well on this instru- 
ment may also be introspective and overly 
sensitive to their own personality defects). 


Overview 


In considering the larger meaning of this 
study, the main theme that emerges is the 
failure to implicate test-wiseness as an im- 
portant source of distortion on personality 
scales. Pending subsequent confirmation, it 
appears that test-wiseness, despite its preva- 
lence, is not a functional unity and is not 
extensively linked to performance on inven- 
tory measures of personality. Additionally, 
test-wiseness seems to be unrelated to other 
potentially relevant variables, apart from its 
interesting tie with SD response style. These 
results raise a variety of provocative issues 
about the causes of distortion and the proc- 
esses associated with test-wiseness. From a 
broader perspective, the findings underscore 
how little is currently known about the de-_ 
terminants of responses on self-report devices. 
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